RAID
RAID structure
_$: for md in /dev/md?; do echo $md ; mdadm --detail $md | grep "/dev/sd"; done
/dev/md0
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1
/dev/md1
0 8 2 0 active sync /dev/sda2
1 8 18 1 active sync /dev/sdb2
/dev/md2
0 8 5 0 active sync /dev/sda5
1 8 21 1 active sync /dev/sdb5
/dev/md3
0 8 6 0 active sync /dev/sda6
1 8 22 1 active sync /dev/sdb6
RAID status
_$: for md in /dev/md? ; do echo $md ; mdadm --detail $md | grep "State :"; done
/dev/md0
State : clean
/dev/md1
State : clean
/dev/md2
State : clean
/dev/md3
State : clean
We can also use /proc
:
_$: cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md3 : active raid1 sda6[2] sdb6[1]
566473536 blocks super 1.2 [2/2] [UU]
md2 : active raid1 sda5[2] sdb5[1]
97589120 blocks super 1.2 [2/2] [UU]
md1 : active raid1 sda3[2] sdb3[1]
292837184 blocks super 1.2 [2/2] [UU]
md0 : active raid1 sda1[2] sdb1[1]
3903424 blocks super 1.2 [2/2] [UU]
unused devices: <none>
Replace a failed hard disk
The /dev/sda
hard disk has failed, however /dev/sdb
is OK.
Initial status
_$: for md in /dev/md?; do state=$(mdadm --detail $md | grep "State : " | cut -f2 -d':'); printf "%s: %s\n" $md $state; done
/dev/md0: clean, degraded
/dev/md1: clean, degraded
/dev/md2: clean, degraded
_$: cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md2 : active raid1 sdb6[0]
953193280 blocks super 1.2 [2/1] [U_]
md0 : active raid1 sdb1[0]
1950656 blocks super 1.2 [2/1] [U_]
md1 : active raid1 sdb5[0]
19513216 blocks super 1.2 [2/1] [U_]
Remove the hard disk from the partitions
_$: mdadm --manage /dev/md0 --fail /dev/sda1
_$: mdadm --manage /dev/md1 --fail /dev/sda5
_$: mdadm --manage /dev/md2 --fail /dev/sda6
Remove the partitions from the RAID
_$: mdadm --manage /dev/md0 --remove /dev/sda1
_$: mdadm --manage /dev/md1 --remove /dev/sda5
_$: mdadm --manage /dev/md2 --remove /dev/sda6
We can also achieve the same with just one command:
_$: mdadm --manage /dev/md0 --fail /dev/sda1 --remove /dev/sda1
Now we power off the computer and replace the failed hard disk.
Copy the partition table from the old disk to the new one
_$: sfdisk -d /dev/sdb | sfdisk /dev/sda # Da errores? => Forzar
_$: sfdisk -d /dev/sdb | sfdisk --force /dev/sda
_$: sfdisk -l /dev/sda
Disk /dev/sda: 121601 cylinders, 255 heads, 63 sectors/track
Warning: extended partition does not start at a cylinder boundary.
DOS and Linux will interpret the contents differently.
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0
Device Boot Start End #cyls #blocks Id System
/dev/sda1 0+ 243- 243- 1951744 fd Linux raid autodetect
/dev/sda2 243+ 486- 244- 1952768 82 Linux swap / Solaris
/dev/sda3 486+ 121601- 121115- 972855297 5 Extended
/dev/sda4 0 - 0 0 0 Empty
/dev/sda5 486+ 2917- 2432- 19529728 fd Linux raid autodetect
/dev/sda6 2917+ 121601- 118684- 953324544 fd Linux raid autodetect
Add the partitions to the RAID
_$: mdadm --manage /dev/md0 --add /dev/sda1
_$: mdadm --manage /dev/md1 --add /dev/sda5
_$: mdadm --manage /dev/md2 --add /dev/sda6
_$: cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md2 : active raid1 sda6[2] sdb6[0]
953193280 blocks super 1.2 [2/1] [U_]
resync=DELAYED
md0 : active raid1 sda1[2] sdb1[0]
1950656 blocks super 1.2 [2/2] [UU]
md1 : active raid1 sda5[2] sdb5[0]
19513216 blocks super 1.2 [2/1] [U_]
[==>..................] recovery = 13.0% (2543808/19513216) finish=1.8min speed=149635K/sec
Install GRUB in the new hard disk
Check the GRUB version that we are using is GRUB2:
_$: grub-install -v
grub-install (GRUB) 1.99-21ubuntu3.10
If we were using GRUB Legacy (GRUB 1) we would see something like:
_$: grub-install -v
grub-install (GNU GRUB 0.97)
If we are not using GRUB2, the first step is to update to GRUB2:
_$: apt-get update
_$: apt-get purge grub-common
_$: apt-get install grub-pc
a) Install GRUB using grub-install
:
We will install GRUB in the MBR of the hard disk, not in the /boot
partition.
GRUB will be installed in the MBR and the files will be in /boot
or wherever we say with the --boot-directory
flag.
_$: grub-install /dev/sda
Installation finished. No error reported.
b) Install GRUB manually (not recommended). Only if you are using GRUB Legacy.
_$: grub
grub> find /boot/grub/stage1
root (hd0,1)
grub> root (hd0,1)
grub> setup (hd0)
grub> quit
Check GRUB has been properly installed
_$: grub-install --recheck /dev/sda
Installation finished. No error reported.
_$: grub-install --recheck /dev/sdb
Installation finished. No error reported.
Force RAID synchronization
_$: echo 'check' > /sys/block/md1/md/sync_action
Repair a RAID after a power outage
It is possible that after a power outage, the RAID remains in bad state.
_$: cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid1 sda1[2] sdb1[3]
1950656 blocks super 1.2 [2/2] [UU]
md1 : active raid1 sda5[2]
19513216 blocks super 1.2 [2/1] [_U]
md2 : active raid1 sda6[2]
953193280 blocks super 1.2 [2/1] [_U]
unused devices: <none>
Check the file system is not in read-only mode
_$: touch a
touch: cannot touch `a': Read-only file system
If it is, reboot the computer
_$: reboot
Mount the RAID
_$: mdadm --manage /dev/md1 --add /dev/sdb5
mdadm: added /dev/sdb5
_$: mdadm --manage /dev/md2 --add /dev/sdb6
mdadm: added /dev/sdb6
_$: cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid1 sda1[2] sdb1[3]
1950656 blocks super 1.2 [2/2] [UU]
md1 : active raid1 sdb5[3] sda5[2]
19513216 blocks super 1.2 [2/1] [_U]
[=>...................] recovery = 8.5% (1665408/19513216) finish=2.3min speed=128108K/sec
md2 : active raid1 sdb6[3] sda6[2]
953193280 blocks super 1.2 [2/1] [_U]
resync=DELAYED
unused devices: <none>
Run a long test in the failed hard disk
_$: smartctl -t long /dev/sdb
_$: smartctl -a /dev/sda | grep -A 1 "Self-test execution status"
Repair the RAID with a hard disk failed (initramfs)
_$: mdadm --manage /dev/md0 --fail /dev/sda1
mdadm: set device faulty failed for /dev/sda1: No such device
_$: mdadm --manage /dev/md0 --remove /dev/sda1
mdadm: hot remove failed for /dev/sda1: No such device or address
This means that the /dev/sda
hard disk has already been removed from the RAID, so we just need to add it:
_$: mdadm --manage /dev/md0 --add /dev/sda1
mdadm: added /dev/sda1
And do the same for the rest of the partitions: /dev/sda3
, /dev/sda5
, etc.
Check there is a mdadm
script in /etc/cron.daily
.
/etc/cron.daily/mdadm:
----------------------
#!/bin/sh
#
# cron.daily/mdadm -- daily check that MD devices are functional
#
# Copyright © 2008 Paul Slootman <paul@debian.org>
# distributed under the terms of the Artistic Licence 2.0
# As recommended by the manpage, run
# mdadm --monitor --scan --oneshot
# every day to ensure that any degraded MD devices don't go unnoticed.
# Email will go to the address specified in /etc/mdadm/mdadm.conf .
#
set -eu
MDADM=/sbin/mdadm
[ -x $MDADM ] || exit 0 # package may be removed but not purged
exec $MDADM --monitor --scan --oneshot
If there is, every day a RAID check will be run and an email will be sent in case of problems. But we still need to configure the email:
/etc/mdadm/mdadm.conf:
----------------------
...
# instruct the monitoring daemon where to send mail alerts
MAILADDR <user>@example.com
...
Finally we have to check in the /etc/crontab
file the time at which the scripts placed in /etc/cron.daily
will be run. The computer should be on at that time.
Repair a hard disk in spare
state
_$: mdadm --detail /dev/md3
/dev/md3:
Version : 1.2
Creation Time : Wed Sep 25 17:57:16 2013
Raid Level : raid1
Array Size : 566473536 (540.23 GiB 580.07 GB)
Used Dev Size : 566473536 (540.23 GiB 580.07 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Tue Mar 3 10:57:49 2015
State : clean, degraded
Active Devices : 1
Working Devices : 2
Failed Devices : 0
Spare Devices : 1
Name : user-nix:3 (local to host user-nix)
UUID : b9ac204b:cda03d57:5b3304d8:1fbfa57f
Events : 16916
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 22 1 active sync /dev/sdb6
2 8 6 - spare /dev/sda6
_$: mdadm --re-add /dev/md3 /dev/sda6
Cannot open /dev/sda6: Device or resource busy
_$: mdadm --manage /dev/md3 --fail /dev/sda6 --remove /dev/sda6
_$: mdadm --manage /dev/md3 --add /dev/sda6