Troubleshooting Raid Manager (RM6)

Following on from my original RM6 Cheat Sheet, I have put together this article in providing some troubleshooting tips that I have come across over the past few years working alongside the arrays supported by this raid manager software.

Use healthchk to determine any recognisable faults.

# /usr/lib/osa/bin/healthchk -a

Check the led status on the RDAC unit.

RM6 from panel LEDs

This is also available in the maintenance and tuning view within the RM6 gui. The following image provides a Status LED interpretation

Status LED interpretation

The recovery view will also give the option for manual/automatic recovery of on array fault

RM6 patches

Consult SunSolve enotify #20029 for the latest patches. It is the official patch/compatibility matrix and publicly available.

Common Issues

Some common issues and troubleshooting tips

  • Running probe-scsi-all at the OBP will show controllers and luns attached -- WILL NOT SHOW ALL DISKS IN TRAYS
  • make sure the /etc/osa/mnf file does not have any (dots) .'s in it. For example:
    • mbc_001 = good
    • mbc.lab.001 = bad
  • Use /usr/lib/osa/bin/healthck -a to check for known problems. Use the recovery guru if any problems are found
  • Use /usr/lib/osa/bin/lad to see what controllers are available and what luns are on them
  • The output from /usr/sbin/format will show you what luns the OS sees

Additional Information

Firstly, try to get a module profile from the RM6 gui (click file->save module profile). If you do not have access to the GUI, you can get the output by running the following:

# storutil -c c#t#d#s# -d
# raidutil -c c#t#d#s# -i
# raidutil -c c#t#d#s# -B
# drivutil -d c#t#d#s#
# drivutil -i c#t#d#s#
# drivutil -I c#t#d#s#
# drivutil -l c#t#d#s#
# drivutil -p lun c#t#d#s#

RM6 GUI shows nothing (or is incorrect)

Use the following procedure to resolve this issue:

  • exit the gui
  • remove lock files
    # rm /etc/osa/lunlocks
    # rm /etc/osa/locks/*.lock

Unable to load driver after OS upgrade

This is most likely due to the fact that Solaris was upgraded from a 32 bit only version (2.6-) to a 32/64 version (2.7+). To resolve:

  • remove the current packages
    # pkgrm SUNWosafw SUNWosamn SUNWosar SUNWosau
  • re-add the packages
    # pkgadd -d <pkg-location> SUNWosafw SUNWosamn SUNWosar SUNWosau

Controller held in reset

Firstly try and resolve via RM6 software

# /usr/lib/osa/bin/rdacutil -u <module name>
# /usr/lib/osa/bin/rdacutil -U <module name>

If the above fails, then perform a hardware recovery

  • Power down the controllers.
  • Dislodge the controller showing good led status.
  • Power up the reset controller.

This should bring the reset controller to normal mode.

Firmware will not upgrade due to i/o

This is most likely due to volume manage having control of the luns.

  • stop all volumes attached to those luns,
  • deport the diskgroup the luns are in.
  • Retry the firmware upgrade.

Note: If the luns are in rootdg. You will need to turn VxVM off completely. This may involve unencapsulation procedures.

Firmware upgrade fails using the gui

This is most likely the controller is now a door stop. However try using the fwutil command. If this fails, the controller is a door stop.

# fwutil /usr/lib/osa/fw/xxxxxx.bwd c#t#d#s#
# fwutil /usr/lib/osa/fw/xxxxxx.apd c#t#d#s#

Two modules have the same name

You may get a name change warning and there are two modules with the same name. To resolve:

  • Use the config gui or raidutil -i to locate the empty module (the empty one will most often be the original)
  • Delete the empty module. The newly named module will take it's place. (this is normal )

Lad and format don't match

  • Syncing up lad and format outputs pre 6.22 and or Solaris 7
    # cd /dev/dsk 
    # rm c?t?d* (all the c#t#'s that are associated with the LUN)
    # cd /dev/rdsk
    # rm c?t?d* (all the c#t#'s that are associated with the LUN)
    # cd /dev
    # rm -r osa
    # cd /devices/pseudo/
    # rm -r rdnexus@*
    # reboot -- -r
    Check the devices and if either lad or format are still missing devices issue another reboot -- -r to do another reconfiguration boot.
  • Syncing up lad and format outputs in 6.22 and or Solaris 7
    PROCEDURE TO FOLLOW ONCE I CONFIRM IT IS ROCK SOLID TO RELEASE

Problem getting 16 lun support

If you have problems getting 16 lun support to work on PCI based systems with A1000,RMS2000 arrays review SunSolve SRDB #21234

This is usually a issue with an incorrect entry in the /dev/glm.conf file. Use the following to enable 16 un support pn PCI (E250/E450) sysetms with A1000/RMS200 arrays:

device-type-scsi-options-list =
    "Symbios StorEDGE A1000", "lsi-scsi-options",
    "Symbios StorEDGE A3000", "lsi-scsi-options",
    "SYMBIOS RSM Array 2000", "lsi-scsi-options";
   lsi-scsi-options = 0x107f8;

For Clarion arrays see SunSolve InfoDoc #20966

Adding 16/32 lun support

Pre 6.22 (RM6.1 and above)

  • Kernel patch 105181-xx (latest) is installed
  • ISP Patch 105600-xx (latest patch provides 32 lun support)
  • Patch 105356-xx (Solaris 2.6 sd patch)
  • Procedure:
    • Edit the /usr/lib/osa/rmparams file and adjust the following variable to 16 => System_MaxLunsPerController
    • Go to the /usr/lib/osa/bin directory and run the following script
      # cd /usr/lib/osa/bin
      # ./add16lun.sh
    • reconfig reboot

16 and 32 LUN support for 6.22

If you are running 6.22 you can either download these scripts from SunSolve or find them on the 6.22 Raid Manager cdrom under the Tools directory. Maximum LUN support has been increased to 32 when running 6.22 on some systems.

Foles included in the tar ball:

  • README
  • add16lun.sh
  • add32lun.sh

Additional troubleshooting scenarios

Review chapter 4 of the A3500/A3500fc controller module guide (Doc ID 805-4980-11)