Basic troubleshooting Solaris disk I/O performance issues

One of the biggest headaches and concerns for any Solaris sysadmin is performance related issues. You find yourself scrambling around and generally under pressure to identify in realtime the performance of a server.

In this post we will look at disk i/o performance and with the numerous tools available at our finger tips in Solaris we will attempt to identify the issues.

It's Friday afternoon and we get the call from the Oracle DBA raising concerns about slowness of the filesystem as an export is taking too long and they suggest that the issue is with a filesystem and not the Oracle DB or Application running (yeah, we have all had those calls)...

As a seasoned sysadmin we start to look for the potential disk associated with the filesystem in question. This allows us then to monitor the disk i/o. Lets begin...

In a nutshell, there can be many reasons for disk performance, for example:

  • Disk layout/usage
  • Hardware
  • Allication utilisation
  • Incorrectly mount options
  • Availability of resources

Using iostat

We will start by using iostat to check for disk i/o performance:

# iostat -en
  ---- errors ---
  s/w h/w trn tot device
    0   0   0   0 c0t3d0
    0   0   0   0 c1t0d0
    0   0   0   0 c1t1d0
    0   0   0   0 c1t2d0

Using -e to display device error summary and -n to display names in a more descriptive format we can make a start.

ColumnDescriptions
s/w Software errors
h/w Hardware errors
trn Transport errors
Tot Total errors
Device Logical disk

The feilds which needs to be looked at are H/W and S/W, If you find any H/W errors on our suspected disk, then keep monitor whether the errors are increasing more. if yes, then might be the chance for replacing the disks. Checking the quick overview of disk I/O performance or disk bottle neck.

If we us addual flags for iostat we cna look at read/write and wait times for devices. For example:

# iostat -xnCz
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0    0.0    0.0    0.0  0.0  0.0    0.3   10.5   0   0 c0
    0.0    0.0    0.0    0.0  0.0  0.0    0.3   10.5   0   0 c0t3d0
    0.1    0.1    1.3    0.6  0.0  0.0    0.0   36.0   0   0 c1
    0.0    0.1    0.6    0.3  0.0  0.0    0.0   33.9   0   0 c1t0d0
ColumnDescriptions
r/s reads per second
w/s writes per second
kr/s kilobytes read per second
kw/s kilobytes written per second
wait average number of transactions waiting for service (queue length)
actv average number of transactions actively being serviced
wsvc_t average service time in wait queue, in milliseconds
asvc_t average service time of active transactions, in milliseconds
%w percent of time there are transactions waiting for service (queue non-empty)
%b percent of time the disk is busy (transactions in progress)

The value which needs to be consider is on above out is r/s, w/s, %b, asvc_t. If the r/s, w/s value is high along with %b with 5-7 %, the asvc_t is having more than 30-50 milliseconds, then we have to concentrate on below concerns:

  • If its NFS related disk, then we have to engage the NAS team to check the disk I/O from their end.
  • If its SAN related disk, then we have to engage the SAN team for further investigations. If the Disk layout from one disk, we can recommend to spread the disk into multiple LUN layout for better performance. (example: 10Times X 100GB Luns will provide better performance than 1000GB single disk).

Using fsstat

Another utility available os the fsstat command to check the file system performance. Using the -F flag shows all the file system status':

# fsstat -F
 new  name   name  attr  attr lookup rddir  read read  write write
 file remov  chng   get   set    ops   ops   ops bytes   ops bytes
1.59K   180   352 1.02M   833  6.24M  101K  751K  203M  344K 52.6M ufs
    0     0     0   100     0    118     0     7 17.5K     0     0 nfs
    0     0     0    20     0      0     0     0     0     0     0 zfs
    0     0     0    10     0      0     0     0     0     0     0 hsfs
    0     0     0 6.01K     0      0     0     0     0     0     0 lofs
5.63K 3.84K 1.45K 33.1K   101  15.0K    10 52.7K 53.5M 54.3K 47.5M tmpfs

We can also check speific file system stats:

# fsstat -i zfs ufs
 read read  write write rddir rddir rwlock rwulock
  ops bytes   ops bytes   ops bytes    ops     ops
    0     0     0     0     0     0      0       0 zfs
 242K  704M  115K 79.2M  171K 16.5M  3.62M   3.62M ufs

Using sar

We can also use the sar command to check disk performance:

# sar -d 1
SunOS schlumpf 5.10 Generic_142910-04 i86pc    09/07/2010
device        %busy   avque   r+w/s  blks/s  avwait  avserv
fd0               0     0.0       0       0     0.0     0.0
iscsi_se          0     0.0       0       0     0.0     0.0
md0               0     0.0       0       0     0.0     0.0
md1               0     0.0       0       0     0.0     0.0
ColumnDescriptions
%busy, avque portion of time device was busy servicing a transfer request
read/s, write/s, blks/s Number of read/write transfers from or to device
avwait average wait time in milliseconds.
avserv average service time in milliseconds

That's it ... Just a sample of the tools available under Solaris to determine if we have disk i/o performance issues (or not!)