One of my RAID arrays on my server decided that one of the drives was bad and dropped it out of my array. I have two software defined RAID 1 mirrored arrays, /dev/md0 which contains my main drives, and then a smaller array, /dev/md1
This is what mdadm was showing for when one of the drives was dropped out:
kevin@linuxsvr:~$ sudo mdadm --detail /dev/md1 /dev/md1: Version : 0.90 Creation Time : Sat May 16 18:38:51 2009 Raid Level : raid1 Array Size : 1485888 (1451.31 MiB 1521.55 MB) Used Dev Size : 1485888 (1451.31 MiB 1521.55 MB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Tue Mar 5 14:10:24 2013 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 UUID : 44b55b61:84e84f5f:5c7760e0:2ac997c6 Events : 0.90560 Number Major Minor RaidDevice State 0 8 21 0 active sync /dev/sdb5 1 0 0 1 removed
I couldn’t find any messages in syslog for what was wrong with my drive, and the SMART status for both drives was still good. I did have to power off the server to move it without a clean shutdown, so this was probably self-inflicted…
On one of my arrays, adding back the missing drive caused it to add as a spare, it re-sync’d and then everything was back to normal. On the other, it wouldn’t add back:
kevin@linuxsvr:~$ sudo mdadm --add /dev/md1 /dev/sdc5 mdadm: /dev/sdc5 reports being an active member for /dev/md1, but a --re-add fails. mdadm: not performing --add as that would convert /dev/sdc5 in to a spare. mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sdc5" first.
I found a few posts describing to fail the drive, remove and then add it back, but this still gave the same error:
sudo mdadm --manage /dev/md1 --fail /dev/sdc5 sudo mdadm --manage /dev/md1 --remove /dev/sdc5 sudo mdadm --manage /dev/md1 --add /dev/sdc5
I don’t know exactly what the recommendation in the error message did, but using the –zero-superblock option and then adding back the drive again did the job. It resync’d successfully and everything’s back to normal.
This post on StackExchange has some good info and suggestions. This one too.