How to distinguish between a failed and a failing disk
It is important to distinguish between a failed disk and one that is failing. In doing so may save you a time when you need to replace it.
In this article we explore different options to detect in a device has failed or is in the process of failing.
/var/adm/messages
A failing disk will show read and/or write errors in /var/adm/messages
Jan 1 03:11:19 mars scsi: [ID 107833 kern.warning] WARNING: /pci@1c,600000/scsi@2/sd@1,0 (sd1): Jan 1 03:11:19 mars Error for Command: write(10) Error Level: Retryable Jan 1 03:11:19 mars scsi: [ID 107833 kern.notice] Requested Block: 37782714 Error Block: 37782714 Jan 1 03:11:19 mars scsi: [ID 107833 kern.notice] Vendor: SEAGATE Serial Number: 0217P1KPEK Jan 1 03:11:19 mars scsi: [ID 107833 kern.notice] Sense Key: Unit Attention Jan 1 03:11:19 mars scsi: [ID 107833 kern.notice] ASC: 0x29 (bus device
Whereas, a failed disk simply won't respond:
Jul 19 11:21:59 mars scsi: [ID 107833 kern.warning] WARNING: /pci@1f,700000/scsi@2/sd@1,0 (sd2): Jul 19 11:21:59 mars disk not responding to selection Jul 19 11:22:01 mars scsi: [ID 107833 kern.warning] WARNING: /pci@1f,700000/scsi@2/sd@1,0 (sd2): Jul 19 11:22:01 mars disk not responding to selection
Using iostat
A failing disk will show an increase in the number of hard and transport errors over time.
# iostat -En c0t3d0 c0t3d0 Soft Errors: 0 Hard Errors: 28473 Transport Errors: 107662 Vendor: SEAGATE Product: ST336607LSUN36G Revision: 0236 Serial No: 0217P1KPEK Size: 36.42GB <36418595328 bytes> Media Error: 0 Device Not Ready: 0 No Device: 28473 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0
A failed disk will only show an increase in the number of transport errors.
# iostat -En c0t3d0 c0t3d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 18 Vendor: SEAGATE Product: ST373207LSUN72G Revision: 045A Serial No: 053432A5HL Size: 73.40GB <73400057856 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0
via Format
A failing disk is still visible in the format command.
AVAILABLE DISK SELECTIONS: 0. c0t0d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848> /pci@780/pci@0/pci@9/scsi@0/sd@0,0 1. c0t3d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848> /pci@780/pci@0/pci@9/scsi@0/sd@1,0 2. c1t0d0 <SEAGATE-ST973402SSUN72G-0400-68.37GB> /pci@780/pci@0/pci@a/scsi@1/sd@0,0 3. c1t1d0 <SEAGATE-ST973402SSUN72G-0400-68.37GB> /pci@780/pci@0/pci@a/scsi@1/sd@1,0 Specify disk (enter its number):
A failed disk is marked drive not available
.
AVAILABLE DISK SELECTIONS: 0. c0t0d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424> /pci@1f,700000/scsi@1/sd@0,0 1. c0t3d0 <drive not available> /pci@1f,700000/scsi@1/sd@1,0 2. c1t0d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424> /pci@1f,700000/scsi@2/sd@0,0 3. c1t1d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424> /pci@1f,700000/scsi@2/sd@1,0 Specify disk (enter its number):