Solaris Crash Dumps
If a SunOS/Solaris system crashes (also known as a "panic") it is possible to write the entire contents of memory to disk for later analysis. It is highly recommended to enable so called crash dumps on the SunOS/Solaris servers.
Note: Care must be taken that enough disk space is available (e.g. on large db servers with 500MB memory, it may not be possible to dump to swap space).
Enabling dumps
- For Solaris 1 (SunOS 4.x) edit
/etc/rc.local
and uncomment the following lines:# # Enable savecore (default is disabled) # mkdir -p /var/crash/`hostname` echo -n 'checking for crash dump... ' intr savecore /var/crash/`hostname` echo ''
- For Solaris 2 edit the file
/etc/init.d/sysetup
and uncomment the following lines:## ## Enable savecore (default is disabled) ## if [ ! -d /var/crash/`uname -n` ] then mkdir -p /var/crash/`uname -n` fi echo 'checking for crash dump...\c ' savecore /var/crash/`uname -n` echo ''
Optional tips
- All crash dumps could have highly confidential information since they contain all application memory space at the time of the crash. I highly recommend to add the following line to the above files, to prevent unauthorised access to the dumps.
chown -R root.staff /var/crash chmod -R 600 /var/crash
- If the file
minfree
exits in the crash directory, the number in this file specifies how many kilobytes of space must remain free on this filesystem oncesavecore
has completed. - Dump to a special (i.e. not the swap) device is possible. On Solaris 1, add the line to the kernel configuration file (assuming you want to use device sd1b) and rebuild the kernel (see
man config
):config vmunix swap on sd1b
On Solaris 2, it's a bit trickier,adb
must be used. - If the several system crashes are expected, compress previous dumps. They often compress to only 5% of the original size. The same goes for dumps which are archived.
- Crash dumps MUST be analysed on the same OS version and architecture as they were created (with
savecore
).
Initial Crash Dump analysis
The following commands can be used to analyse what was going on in the system before the panic occurred.
Using CRASH:/etc/crash -d vmcore.0 -n vmunix.0/usr/sbin/crash -d vmcore.0 -n vmunix.0
Description | Solaris 1 (SunOS 4) | Solaris 2.x |
---|---|---|
What OS is this? | strings vmcore.0 | grep SunOS | strings vmcore.0 | grep SunOS |
What host is this? | strings vmcore.0 | grep machine | strings vmcore.0 | grep machine |
What processes were running? | ps -laxk vmunix.0 vmcore.0 | use crash (see below) |
Show system tables | pstat -T vmunix.0 vmcore.0 | |
Show network stats | netstat vmunix.0 vmcore.0 | netstat -d unix.0 vmcore.0 |
Show NFS stats | nfsstat -n vmunix.0 vmcore.0 | nfsstat -n unix.0 vmcore.0 |
Show arp table | arp -c vmunix.0 vmcore.0 | arp -a unix.0 vmcore.0 |
Show IPC stuff | ipcs -a -N vmunix.0 -C vmcore.0 | ipcs -a -N unix.0 -C vmcore.0 |
crash help | > help | > help |
Help on "p" command | > help p | > help p |
Show processes | > p -e | > p -e |
Lots of process details | > p -l | |
crash details | > status | |
Quit crash | > q | > q |
Using the ADB debugger: | adb -k vmunix.0 vmcore.0 | adb -k unix.0 vmcore.0 |
What was the panic message? | *panicstr/s | *panicstr/s |
Hostname | hostname/s | $<utsname |
OS Version | version/s | $<utsname |
Domain | domainname/s | srpc_domain/s |
Machine | sysname/s | $<utsname |
Manufacturer | hw_provider/s | |
Crash Time/date | time/Y | TIME/y |
Boot time/date | *boottime=Y | *time-(lbolt%0t100)=Y |
Display system messages | msgbuf+10/s | msgbuf+14s |
Recent message buffer (ring) | $<msgbuf | $<msgbuf |
C stack traceback (not always right!) |
$c | $c |
Stack traceback | <sp$<stacktrace | ?? |
What is root device? | rootfs$<bootobj | |
What is swap device? | swapfile$<bootobj dumpfile$<bootobj |
|
Show registers | $cregs | |
Show IPC stuff | ipcaccess/10i | |
Quit adb | CTRL-D or $q | CTRL-D or $q |
Access a live kernel): | adb -k /vmunix /dev/mem | adb -k /dev/ksyms /dev/mem |
adb
macros are located in /usr/lib/adb (Solaris 1) or /usr/kvm/lib/adb (Solaris 2).
Further reading
- Crash Dump Configuration
- Solaris Crash Dump Analysis (CDA)
- See also the man pages for: savecore(1m), crash(1m), adb(1m), and kadb(1m).