How to distinguish between a failed and a failing disk

It is important to distinguish between a failed disk and one that is failing. In doing so may save you a time when you need to replace it.

In this article we explore different options to detect in a device has failed or is in the process of failing.

/var/adm/messages

A failing disk will show read and/or write errors in /var/adm/messages

Jan  1 03:11:19 mars scsi: [ID 107833 kern.warning] WARNING: /pci@1c,600000/scsi@2/sd@1,0 (sd1):
Jan  1 03:11:19 mars  Error for Command: write(10)               Error Level: Retryable
Jan  1 03:11:19 mars scsi: [ID 107833 kern.notice]    Requested Block: 37782714                  Error Block: 37782714
Jan  1 03:11:19 mars scsi: [ID 107833 kern.notice]    Vendor: SEAGATE                            Serial Number: 0217P1KPEK
Jan  1 03:11:19 mars scsi: [ID 107833 kern.notice]    Sense Key: Unit Attention
Jan  1 03:11:19 mars scsi: [ID 107833 kern.notice]    ASC: 0x29 (bus device

Whereas, a failed disk simply won't respond:

Jul 19 11:21:59 mars scsi: [ID 107833 kern.warning] WARNING: /pci@1f,700000/scsi@2/sd@1,0 (sd2):
Jul 19 11:21:59 mars   disk not responding to selection
Jul 19 11:22:01 mars scsi: [ID 107833 kern.warning] WARNING: /pci@1f,700000/scsi@2/sd@1,0 (sd2):
Jul 19 11:22:01 mars   disk not responding to selection

Using iostat

A failing disk will show an increase in the number of hard and transport errors over time.

# iostat -En c0t3d0
c0t3d0           Soft Errors: 0 Hard Errors: 28473 Transport Errors: 107662
Vendor: SEAGATE  Product: ST336607LSUN36G  Revision: 0236 Serial No: 0217P1KPEK
Size: 36.42GB <36418595328 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 28473 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0

A failed disk will only show an increase in the number of transport errors.

# iostat -En c0t3d0
c0t3d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 18
Vendor: SEAGATE  Product: ST373207LSUN72G  Revision: 045A Serial No: 053432A5HL
Size: 73.40GB <73400057856 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0

via Format

A failing disk is still visible in the format command.

AVAILABLE DISK SELECTIONS:
       0. c0t0d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
          /pci@780/pci@0/pci@9/scsi@0/sd@0,0
       1. c0t3d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
          /pci@780/pci@0/pci@9/scsi@0/sd@1,0
       2. c1t0d0 <SEAGATE-ST973402SSUN72G-0400-68.37GB>
          /pci@780/pci@0/pci@a/scsi@1/sd@0,0
       3. c1t1d0 <SEAGATE-ST973402SSUN72G-0400-68.37GB>
          /pci@780/pci@0/pci@a/scsi@1/sd@1,0
Specify disk (enter its number):

A failed disk is marked drive not available.

AVAILABLE DISK SELECTIONS:
       0. c0t0d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424>
          /pci@1f,700000/scsi@1/sd@0,0
       1. c0t3d0 <drive not available>
          /pci@1f,700000/scsi@1/sd@1,0
       2. c1t0d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424>
          /pci@1f,700000/scsi@2/sd@0,0
       3. c1t1d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424>
          /pci@1f,700000/scsi@2/sd@1,0
Specify disk (enter its number):