Troubleshooting a hung process
It is very simple to verify if a process that is not responding is running any activity or is in a hung state whilst making a system call, or simply will not start for some reason.
A system call is a method for a user process to access the kernel for common operations such as open/close files, read/write files, open network sockets, and read/write network sockets.
Under the Solaris operating environment we can use the truss
command to trace system calls and signals. Likewise, on Linux based systems we can use the equivalent strace
command to do the same job.
Tracing a running process
To trace an already running process, we firstly need to identify the Process ID (pid). For example:
$ ps -ef | grep <process name>
Now we have this information, we can trace the process ID using the following:
- On Solaris, use the syntax:
truss <options> -p <pid> -o <logfile>
For example:$ truss -faied -p 6453 -o /tmp/truss.log
- For Linux based systems, use:
strace <options> -p <pid> -o <logfile>
For example:$ strace -f -p 14322 -o /tmp/strace.log
Use ctrl+c to stop your truss
/strace
, and finally review the logfile.
Tracing an application at startup
It's also possible to use truss
or strace
while starting an application in order to trace the system calls and identify a problem such as why an application exits, why it crashes immediately, or even why it does not start at all.
- On Solaris, use the syntax:
truss <option> -o <logfile> <command>
For example:$ truss -faied -o /tmp/truss.out /bin/date
Oncetruss
has exited, review the logfile. - For Linux based systems, use:
strace <options> -o <logfile> <command>
For example:$ strace -f -o /tmp/strace.log /usr/bin/whoami
Oncestrace
has exited, review the logfile.
Final thoughts
Whilst the above examples are not enough to immediately identify the root cause of an issue, it is a great way to gather some initial information that can help guide further troubleshooting.
Further reading
- An older post on using the truss command on Solaris
- Solaris truss(1) man page.
- Linux strace(1) man page.