Basic troubleshooting Solaris disk I/O performance issues
One of the biggest headaches and concerns for any Solaris sysadmin is performance related issues. You find yourself scrambling around and generally under pressure to identify in realtime the performance of a server.
In this post we will look at disk i/o performance and with the numerous tools available at our finger tips in Solaris we will attempt to identify the issues.
It's Friday afternoon and we get the call from the Oracle DBA raising concerns about slowness of the filesystem as an export is taking too long and they suggest that the issue is with a filesystem and not the Oracle DB or Application running (yeah, we have all had those calls)...
As a seasoned sysadmin we start to look for the potential disk associated with the filesystem in question. This allows us then to monitor the disk i/o. Lets begin...
In a nutshell, there can be many reasons for disk performance, for example:
- Disk layout/usage
- Allication utilisation
- Incorrectly mount options
- Availability of resources
We will start by using
iostat to check for disk i/o performance:
# iostat -en ---- errors --- s/w h/w trn tot device 0 0 0 0 c0t3d0 0 0 0 0 c1t0d0 0 0 0 0 c1t1d0 0 0 0 0 c1t2d0
Using -e to display device error summary and -n to display names in a more descriptive format we can make a start.
The feilds which needs to be looked at are H/W and S/W, If you find any H/W errors on our suspected disk, then keep monitor whether the errors are increasing more. if yes, then might be the chance for replacing the disks. Checking the quick overview of disk I/O performance or disk bottle neck.
If we us addual flags for
iostat we cna look at read/write and wait times for devices. For example:
# iostat -xnCz extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 0.0 0.0 0.0 0.0 0.0 0.3 10.5 0 0 c0 0.0 0.0 0.0 0.0 0.0 0.0 0.3 10.5 0 0 c0t3d0 0.1 0.1 1.3 0.6 0.0 0.0 0.0 36.0 0 0 c1 0.0 0.1 0.6 0.3 0.0 0.0 0.0 33.9 0 0 c1t0d0
|r/s||reads per second|
|w/s||writes per second|
|kr/s||kilobytes read per second|
|kw/s||kilobytes written per second|
|wait||average number of transactions waiting for service (queue length)|
|actv||average number of transactions actively being serviced|
|wsvc_t||average service time in wait queue, in milliseconds|
|asvc_t||average service time of active transactions, in milliseconds|
|%w||percent of time there are transactions waiting for service (queue non-empty)|
|%b||percent of time the disk is busy (transactions in progress)|
The value which needs to be consider is on above out is r/s, w/s, %b, asvc_t. If the r/s, w/s value is high along with %b with 5-7 %, the asvc_t is having more than 30-50 milliseconds, then we have to concentrate on below concerns:
- If its NFS related disk, then we have to engage the NAS team to check the disk I/O from their end.
- If its SAN related disk, then we have to engage the SAN team for further investigations. If the Disk layout from one disk, we can recommend to spread the disk into multiple LUN layout for better performance. (example: 10Times X 100GB Luns will provide better performance than 1000GB single disk).
Another utility available os the
fsstat command to check the file system performance. Using the -F flag shows all the file system status':
# fsstat -F new name name attr attr lookup rddir read read write write file remov chng get set ops ops ops bytes ops bytes 1.59K 180 352 1.02M 833 6.24M 101K 751K 203M 344K 52.6M ufs 0 0 0 100 0 118 0 7 17.5K 0 0 nfs 0 0 0 20 0 0 0 0 0 0 0 zfs 0 0 0 10 0 0 0 0 0 0 0 hsfs 0 0 0 6.01K 0 0 0 0 0 0 0 lofs 5.63K 3.84K 1.45K 33.1K 101 15.0K 10 52.7K 53.5M 54.3K 47.5M tmpfs
We can also check speific file system stats:
# fsstat -i zfs ufs read read write write rddir rddir rwlock rwulock ops bytes ops bytes ops bytes ops ops 0 0 0 0 0 0 0 0 zfs 242K 704M 115K 79.2M 171K 16.5M 3.62M 3.62M ufs
We can also use the
sar command to check disk performance:
# sar -d 1 SunOS schlumpf 5.10 Generic_142910-04 i86pc 09/07/2010 device %busy avque r+w/s blks/s avwait avserv fd0 0 0.0 0 0 0.0 0.0 iscsi_se 0 0.0 0 0 0.0 0.0 md0 0 0.0 0 0 0.0 0.0 md1 0 0.0 0 0 0.0 0.0
|%busy, avque||portion of time device was busy servicing a transfer request|
|read/s, write/s, blks/s||Number of read/write transfers from or to device|
|avwait||average wait time in milliseconds.|
|avserv||average service time in milliseconds|
That's it ... Just a sample of the tools available under Solaris to determine if we have disk i/o performance issues (or not!)