Linux Notes: Managing HPE RAID hardware

The information presented here is intended for educational use by qualified computer technologists.
The information presented here is provided free of charge, as-is, with no warranty of any kind.

HPE Drives: SAS for MAS
Managing hardware RAID on HP/HPE servers (2018.12.12)
BASH scripts to proactively monitor SAS drives not visible to Linux (2019.09.14)
back to my Linux Notes (index)

Created: 2023-12-31

HPE / HP Drives: SAS for MSA (more alphabet soup)

Older SFF Adapter

Newer SFF Adapter

HP/HPE Acronyms:

For reasons I will never understand, Hewlett-Packard split into two companies in 2015:
- HP (desktop focus: PCs, Printers, Inkjet Cartridges)
- HPE (enterprise focus: Enterprise Hardware - Servers etc.)
IMHO a huge amount of money was spent so one company could focus on desktop at a time when that market was shrinking as customers transitioned to hand held devices (phones + tablets)
Hardware
- MSA = Modular Storage Array
- SAS = Serial Attached SCSI
  - SFF = Small Form Factor
  - LFF = Large Form Factor (twice a thick as SFF pictured here)
Software
- SPP = Support Pack ProLiant
  - collections of diagnostic software and firmware libraries
- SSA = Smart Storage Administrator
  - software to manage MSA devices
- SUM = Smart Update Manager
  - software to manage firmware updates

Notes:

Beginning with DL385 gen8 servers, HPE has changed the plastic mounting adapter used to insert the SAS disk drive into its slot.
These shorter SFF (small form factor) disks are not interchangeable since the new style disk-plus-adapter is a little shorter.
- Why shorter? A wiring cable (not seen in this photo) connects an array of multicolored LEDs (seen as green in this photo) to the back end of the socket.
- Note that the drive electronics has not changed so a small Phillips screwdriver is all you need to swap a drive between the old and new brackers.
It should be no surprise that each one of these disks contain a CPU run from firmware. For best results when used in an MSA, these disks should be running the latest firmware. What follows are a few resources on how to re-flash the firmware.

Chart of latest firmware levels: https://www.hpe.com/storage/MSADriveFirmware
then watch one of these:

1) HPE MSA Best Practice for Controller Firmware Update	https://www.youtube.com/watch?v=exaQMRKjNvA
2) HPE MSA Storage best practice for expansion module firmware	https://www.youtube.com/watch?v=_a-FaQcWhBc
3) Updating HPE MSA Storage drive firmware demo	https://www.youtube.com/watch?v=5jodXVECav8&t=111s

Managing hardware RAID volumes on HP/HPE systems

date: 2018-12-20 (updated: 2021-05-20)

Software managed Linux volumes are usually set up via LVM (Logical Volume Manager). This section deals with hardware managed Linux volumes: physical disks managed by HPE's proprietary RAID controllers.
- note: on OpenVMS this is done with the msa utility (msa$util.exe).
HPE publishes software which will allow you to manage HPE controllers and see HPE drives from Linux.
- The good news is this: it can be downloaded and used free of charge from here: https://support.hpe.com/
- You will no longer be challenged for a support agreement number (SAN) or support agreement ID (SAID).
- Kudos to HPE for helping to make their servers more manageable in the field

Executive Summary: There are only three ways (that I'm aware of) to manage h/w RAID on HPE systems

access ORCA (Option Rom Configuration for Arrays) from firmware during any boot
- mostly limited to ADD / DELETE
access SSA (Smart Storage Administrator) after booting HPE Firmware + Diagnostics (from either USB or DVD)
- SSA at this point lets you do whatever you want (very dangerous yet powerful)
- this media is only available with a support contract and goes by the name Service Pack for ProLiant (SPP)
access SSA (Smart Storage Administrator Command Line Interpretor) or SSACLI from with Linux
- will not be able to DELETE logical volumes once they have been associated with Linux devices under /dev

1) GUI-based SSA Utility (way cool tool)

Steps:
	1.  login as root from the graphical front console
	2.  download all files related to ssa-2.65-7.0.x86_64.rpm from https://support.hpe.com 
	3.  rpm -i  ssa-2.65-7.0.x86_64.rpm
	4.  /usr/sbin/ssa -local (Firefox auto opens with a beautiful colored diagram of your
	    RAID config). See page 18 of this manual 
	5.  /usr/sbin/ssa -help  (view all available command-line switches)
---------------------------------------------------------------------------
Tips:	1. I have used this tool to convert a volume from "8-disk RAID-60" to "8-disk RAID-0" on the fly
		This requires several hours and would definitely impact server performance
	2. my next experiment was to convert from "8-disk RAID-0" to "4-disk RAID-0" on the fly
		I didn't even know this was possible (would not work if the volume was full)

2) CLI-based SSA Utility (great for scripting)

Steps:
 
	1. login as root from anywhere
	2. rpm -i ssacli-2.65-7.0.x86_64.rpm
	3. then just type "ssacli" (my typing is in blue)
	4. Notice that the drive in bay-8 is marked "Predictive Failure"
 
###############################################################################################
 
[root@localhost ~]# ssacli
Smart Storage Administrator CLI 2.65.7.0
Detecting Controllers...Done.
Type "help" for a list of supported commands.
Type "exit" to close the console.
 
=> ctrl all show   		# firmware sensitive; does not work on all platforms

   this is the only way to see which controllers were found (slot #0 means embedded)

=> set target ctrl slot=0 	# or: set target ctrl all
				# or: set target ctrl first

=> show config
 
Smart Array P420i in Slot 0 (Embedded)    (sn: 001438024F5D170)

   Port Name: 1I
   Port Name: 2I 
 
   Internal Drive Cage at Port 1I, Box 2, OK
   Internal Drive Cage at Port 2I, Box 2, OK
 
   Array A (SAS, Unused Space: 0  MB)
 
      logicaldrive 1 (1.1 TB, RAID 60, OK)
 
      physicaldrive 1I:2:1 (port 1I:box 2:bay 1, SAS HDD, 300 GB, OK)
      physicaldrive 1I:2:2 (port 1I:box 2:bay 2, SAS HDD, 300 GB, OK)
      physicaldrive 1I:2:3 (port 1I:box 2:bay 3, SAS HDD, 300 GB, OK)
      physicaldrive 1I:2:4 (port 1I:box 2:bay 4, SAS HDD, 300 GB, OK)
      physicaldrive 2I:2:5 (port 2I:box 2:bay 5, SAS HDD, 300 GB, OK)
      physicaldrive 2I:2:6 (port 2I:box 2:bay 6, SAS HDD, 300 GB, OK)
      physicaldrive 2I:2:7 (port 2I:box 2:bay 7, SAS HDD, 300 GB, OK)
      physicaldrive 2I:2:8 (port 2I:box 2:bay 8, SAS HDD, 300 GB, Predictive Failure)
 
   SEP (Vendor ID PMCSIERA, Model SRCv8x6G) 380  (WWID: 5001438024F5D17F)
 
=> show status
 
Smart Array P420i in Slot 0 (Embedded)
   Controller Status: OK
   Cache Status: OK
   Battery/Capacitor Status: OK
 
 
=> show config detail

	bla...bla...bla...
	drive details
	bla...bla...bla...
 
=> exit
[root@localhost ~]#

BASH Scripts (tested on CentOS-7)

drop these two scripts into the /root directory
- remember to set file protection bits via chmod
- modify the mail destinations for your system
- caveat: requires ssacli be installed
from there, invoke them like so "./raid_monitor.sh"
I run them three times a day via "/etc/crontab"

#!/bin/bash
#=============================================================================
# title  : raid_monitor.sh
# purpose: inspect the health of drives not visible to Linux
# notes  : meant to be run from root since ssacli is not SUDO friendly
#        : this script will be run 3-times a day from crontab
# history:
# NSR 20190906 1. original effort
# NSR 20190911 2. more work
# NSR 20190917 3. minor fix in cleanup
# NSR 20191104 4. moved logging to /var/log
# NSR 20200306 5. now do not stop on error (needed if ssacli is not installed)
#=============================================================================
set -vex			# tron (v=verbose, e=stop-on-error, x=display data)
STUB="raid_monitor-"
YADA="/var/log/"${STUB}$(date +%Y%m%d.%H%M%S)".trc"
echo "-i-diverting output to file: "${YADA}
exec 1>>${YADA}
exec 2>&1
set +e				# do not stop on errors (in this script)
echo "-i-starting: "${0}" at "$(date +%Y%m%d.%H%M%S)
rm -f raid_monitor.tmp
# ssacli is installed with RPM
ssacli ctrl slot=0 show config > raid_monitor.tmp
saved_status=$?
echo "-i-saved_status:"$saved_status
if [ $saved_status != 0 ];
then
#   mail -s "RAID Problem" neil,[email protected],[email protected] <<< "-e-could not execute SSACLI"
    # note: ats_adm_list is an alias defined here: /etc/aliases
    mail -s "RAID Problem-01 on host: "$HOSTNAME ats_adm_list <<< "-e-could not execute SSACLI"
    mail -s "RAID Problem-01 on host: "$HOSTNAME root         <<< "-e-could not execute SSACLI"
    exit
fi
# this next script will analyze "raid_monitor.tmp"
/root/raid_analyze_file.sh
saved_status=$?
echo "-i-saved_status:"$saved_status
if [ $saved_status != 0 ];
then
    # note: ats_adm_list is an alias defined here: /etc/aliases
    mail -s "RAID Problem-02 on host: "$HOSTNAME ats_adm_list <<< "-e-one or more drives are not 100% healthy"
    mail -s "RAID Problem-02 on host: "$HOSTNAME root         <<< "-e-one or more drives are not 100% healthy"
    exit
fi
#mail -s "RAID Test OKAY host: "$HOSTNAME root                <<< "-i-test OKAY"
#-----------------------------------------------------------------------------
#find /var/log -name ${STUB}"*.trc" -a -mtime +2 -exec ls -la {} \;
find /var/log -name ${STUB}"*.trc" -a -mtime +2 -exec rm {} \;
echo "-i-exiting:  "${0}" at "$(date +%Y%m%d.%H%M%S)
#

#!/bin/bash
#=============================================================================
# script : raid_analyze_file.sh
# author : Neil Rieck
# created: 2019-09-06
# purpose: 1) Reads a text file (searching for some key words)
#          2) this script is called by /root/raid_monitor.sh
#=============================================================================
#set -vex				# verbose, stop-on-error, xpand
echo "-i-starting script: "${0}
MYFILE="./raid_monitor.tmp"		# hard coded fname
echo "-i-reading: "${MYFILE}
declare -i line
declare -i good
declare -i bad
line=0
good=0
bad=0
while IFS='' read -r LINE || [[ -n ${LINE} ]]; do
#    echo "-i-data: "${LINE}
if [[ ${LINE} == *"physical"* ]];
then
  ((line=line+1))
  # test for: "Predicive Failure" or "Failed"
  if [[ ${LINE} == *"Fail"* ]];
  then
    echo "-i-bad  data: "${LINE}
    ((bad=bad+1))
  fi
  if [[ ${LINE} == *"OK)"* ]];
  then
    echo "-i-good data: "${LINE}
    ((good=good+1))
  fi
fi
done < ${MYFILE}
echo "-i-testing has concluded"
echo "-i-report card"
echo "-i-  lines:"$line
echo "-i-  bad  :"$bad
echo "-i-  good :"$good
#
# a little martial arts so we get the best exit value
#
if [ ${line} -eq 0 ] || [ ${bad} -gt 0 ] || [ ${line} -ne ${good} ];
then
   echo "-w-problems were detected"
   rc=99
else
   echo "-i-all is well"
   rc=0
fi
echo "-i-will exit with code: "$rc
echo "-i-exiting script: "${0}
exit $rc

Back to Home
Neil Rieck
Waterloo, Ontario, Canada.