Error Codes Wiki

Linux RAID Errors — mdadm Degraded Array and Disk Failure Recovery

Criticalfilesystem

Overview

Fix Linux software RAID (mdadm) errors including degraded arrays, failed disk replacement, RAID rebuild procedures, and monitoring RAID health.

Key Details

  • Linux software RAID uses mdadm to manage RAID arrays (RAID 0, 1, 5, 6, 10)
  • A degraded RAID array has lost a disk but is still operational (RAID 1, 5, 6, 10)
  • RAID 0 has no redundancy — any disk failure means total data loss
  • RAID rebuild (resync) can take hours to days depending on array size and I/O load
  • SMART monitoring can predict disk failures before they cause RAID degradation

Common Causes

  • Hard drive failure causing RAID array to enter degraded state
  • Disk removed or disconnected during operation
  • RAID rebuild interrupted by power failure or system crash
  • Disk experiencing bad sectors causing md to mark it as failed
  • SATA cable or controller fault causing intermittent disk disconnections

Steps

  1. 1Check RAID status: cat /proc/mdstat or mdadm --detail /dev/md0
  2. 2Identify the failed disk: mdadm --detail /dev/md0 shows status of each member
  3. 3Remove the failed disk: mdadm /dev/md0 --remove /dev/sdX
  4. 4Replace the physical disk, then partition it identically to the other members
  5. 5Add the new disk: mdadm /dev/md0 --add /dev/sdY — rebuild starts automatically
  6. 6Monitor rebuild progress: watch cat /proc/mdstat

Tags

raidmdadmdegradeddisk-failurerebuild

Related Items

More in Filesystem

Frequently Asked Questions

Depends on array size and activity. A 4TB drive can take 8-24 hours. During rebuild, the array is vulnerable — a second disk failure in RAID 5 means data loss. RAID 6 tolerates two failures.