Hard disk
Hard disk usage
_$: du /etc
_$: du -sh /etc
Hard disk(s) size
_$: dmesg | grep blocks
_$: fdisk -l /dev/sd? | grep "Disk"
_$: for disk in /dev/sd? ; do fdisk -l $disk | grep "Disk $disk" ; done
Hard disk structure
_$: df -Th
_$: fdisk -l
Files with bigger sizes
_$: du -sBM * | sort -nr
_$: du -sBM .* | sort -nr
_$: du -ckx | sort -nr | head
Ordered by increasing size (Top 10):
_$: find . -type f -print0 | xargs -0 du | sort -n | tail -10 | cut -f2 | xargs -I{} du -sh {}
Ordered by decreasing size (Top 10):
_$: find . -type f -print0 | xargs -0 du | sort -nr | head -10 | cut -f2 | xargs -I{} du -sh {}
Without changing filesystem:
_$: find / -mount -type f | xargs du | sort -nr -k 1 | head -n 10
Installed packages with bigger sizes
_$: dpkg-query -Wf '${Installed-Size}\t${Package}\n' | sort -rn | less
Show partition table
_$: fdisk -l /dev/sda
Copy partition table
From: /dev/sda
To: /dev/sdb
_$: sfdisk -d /dev/sda | sfdisk /dev/sdb
Hard disk information
_$: smartctl -i /dev/sda
Check data blocks
_$: badblocks /dev/sda1
If the hard disk is external:
_$: cat /etc/mtab
...
/dev/sdc1 /media/<user>/Seagate\040Backup\040Plus\040Drive fuseblk rw,nosuid,nodev,allow_other,default_permissions,blksize=4096 0 0
_$: badblocks /dev/sdc1
Hard disk test
Do the test
_$: sudo smartctl -t long /dev/sda # Long test (2 hours)
_$: sudo smartctl -t short /dev/sda # Short test (1 minute)
_$: sudo smartctl -t conveyance /dev/sda # Transport test (2 minutes)
See the progress of the test
_$: smartctl -a /dev/sda | grep -A 1 "Self-test execution status"
See the results of the test
_$: smartctl -l selftest /dev/sda # Test results
_$: smartctl -a /dev/sda # All information about the disk
Get hard disk’s UUID
_$: ls -l /dev/disk/by-uuid # Degraded system
total 0
lrwxrwxrwx 1 root root 10 Jul 28 12:29 1c854bb4-77ec-4b49-b187-261bd55ec412 -> ../../dm-1
lrwxrwxrwx 1 root root 10 Jul 28 12:29 9058b8da-6e47-49d1-8238-5835dd5efc7a -> ../../dm-2
lrwxrwxrwx 1 root root 9 Jul 28 12:30 a17a3729-243f-46fe-bb6f-304be50f8797 -> ../../md0
lrwxrwxrwx 1 root root 10 Jul 28 12:30 aa3a9cab-1314-450b-8bd9-f01a3e884912 -> ../../dm-0
lrwxrwxrwx 1 root root 10 Jul 28 12:29 ab1a10d0-d87d-42d8-add5-8a767a012410 -> ../../sda2
lrwxrwxrwx 1 root root 9 Jul 28 12:29 f63823e5-9da9-45cf-82a6-a9bd7cd893de -> ../../md1
_$: ls -l /dev/disk/by-uuid # Normal system
total 0
lrwxrwxrwx 1 root root 9 jul 28 08:37 0c95c63d-8e13-4a4e-bf03-85395878d97a -> ../../md0
lrwxrwxrwx 1 root root 9 jul 28 08:37 195d4e0b-45ae-4a39-8fc8-a5c5492ae5f4 -> ../../md1
lrwxrwxrwx 1 root root 9 jul 28 08:37 ca3007a3-fdd6-4a25-94b0-b9cf6e4e84aa -> ../../md2
lrwxrwxrwx 1 root root 10 jul 28 08:37 d02e46fb-ff0d-4e09-9873-d4d6113e59e3 -> ../../sda2
lrwxrwxrwx 1 root root 9 jul 28 08:37 d6a70065-89da-4c8d-b4c3-14ef878107e4 -> ../../md3
_$: blkid /dev/sda1
/dev/sda1: UUID="2f7d1305-f7a0-9dc7-07f1-f4795f3e773f" UUID_SUB="799103a8-88b0-9681-9dcf-dab9c391ae3c" LABEL="dns:0" TYPE="linux_raid_member"
Get hard disk’s serial
_$: lshw -C disk
*-disk
description: ATA Disk
product: ST1000DM003-1CH1
vendor: Seagate
physical id: 0.0.0
bus info: scsi@0:0.0.0
logical name: /dev/sda
version: CC49
serial: Z1D9XYBD <===
size: 931GiB (1TB)
capabilities: partitioned partitioned:dos
configuration: ansiversion=5
*-disk
description: ATA Disk
product: ST1000DM003-1CH1
vendor: Seagate
physical id: 0.0.0
bus info: scsi@1:0.0.0
logical name: /dev/sdb
version: CC49
serial: Z1D9C0LD <===
size: 931GiB (1TB)
capabilities: partitioned partitioned:dos
configuration: ansiversion=5 signature=0004295a
*-cdrom
description: DVD-RAM writer
physical id: 0.0.0
bus info: scsi@2:0.0.0
logical name: /dev/cdrom
logical name: /dev/cdrw
logical name: /dev/dvd
logical name: /dev/dvdrw
logical name: /dev/sr0
capabilities: audio cd-r cd-rw dvd dvd-r dvd-ram
configuration: status=nodisc
That serial must appear in the hard disk’s case, so that we know which one has failed.
If you have smartmontools
installed, a faster way is this one:
_$: sudo smartctl -a /dev/sdb | grep "Serial Number"
Serial Number: Z1D9C0LD
Bad blocks
_$: smartctl -l selftest /dev/sda
================================================================================
sda (S1D5T7QG)
================================================================================
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.13.0-40-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 3694 47429338
The LBA with the first error is 47429338
1) Find out the partition for said LBA
_$: sudo fdisk -lu /dev/sda
Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x0001345b
Device Boot Start End Blocks Id System
/dev/sda1 * 2048 7813119 3905536 fd Linux raid autodetect
/dev/sda2 7813120 39063551 15625216 82 Linux swap / Solaris
/dev/sda3 39063552 625000447 292968448 fd Linux raid autodetect
/dev/sda4 625002494 1953523711 664260609 5 Extended
Partition 4 does not start on physical sector boundary.
/dev/sda5 625002496 820312063 97654784 fd Linux raid autodetect
/dev/sda6 820314112 1953523711 566604800 fd Linux raid autodetect
It looks like it is going to be partition sda3
but we check it nonetheless:
_$: python -c 'print 39063552 < 47429338 < 625000447' # Block_start < LBA_of_first_error < Block_end
True
So the failing sector is in the sda3
partition. What is in that partition?
_$: grep sda3 /etc/fstab
# / was on /dev/sda3 during installation
So /
is in the sda3
. Where is it mounted?
_$: grep " / " /etc/fstab
# / was on /dev/sda3 during installation
#UUID=c42571f6-beec-4fd5-86b1-2b2ca1684a16 / ext4 errors=remount-ro 0 1
/dev/md1 / ext4 errors=remount-ro 0 1
So sda3
is part of the md1
array and it is an ext4
filesystem.
Now choose your own adventure:
- If you have a RAID system go to step 5.
- If you do not have a RAID system, continue with step 2.
2) Find out the block size of an ext4 filesystem
_$: tune2fs -l /dev/sda3 | grep Block
Block count: 73242112
Block size: 4096
Blocks per group: 32768
So the block size is 4096 bytes (4K).
3) Find out the filesystem’s block that contains the LBA
b = (int)((L-S)*512/B)
where:
b = File System block number
B = File system block size in bytes
L = LBA of bad sector
S = Starting sector of partition as shown by fdisk -lu
and (int) denotes the integer part.
Let's see:
LBA = 47429338
Block_start = 39063552
Block_end = 625000447 # We will not use this one
Block_size = 4096
_$: python -c "b = int((47429338-39063552)*512/4096) ; print(b)"
1045723
So the block that is failing is 1045723
.
4) Find out the inode stored in that block and the file that has said inode
_$: debugfs
debugfs 1.42.9 (4-Feb-2014)
debugfs: open /dev/sda3
/dev/sda3: Can't read an inode bitmap while reading inode bitmap
debugfs: open -c /dev/sda3
/dev/sda3: catastrophic mode - not reading inode or group bitmaps
debugfs: quit
If you have a RAID you can check it this way:
_$: cat /proc/mdstat | grep "sda3" | cut -d' ' -f1
md1
_$: debugfs
debugfs 1.42.9 (4-Feb-2014)
debugfs: open /dev/md1
debugfs: testb 1045723
Block 1045723 marked in use
debugfs: icheck 1045723
Block Inode number
1045723 3014674
debugfs: ncheck 3014674
Inode Pathname
3014674 /home/<user>/isos/local/sqlserver-enu-2008-r2-dvd-eval.iso
debugfs: quit
5) Overwrite the file
a) RAID
Note: This operation is not destructive.
If we have a RAID-1, we should have a copy of the file in the other hard disk. We will force the RAID to check that all files are alright and to repair those that aren’t.
_$: cat /proc/mdstat | grep "sda3" | cut -d' ' -f1
md1
_$: echo 'check' > /sys/block/md1/md/sync_action
_$: cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
...
md1 : active raid1 sda3[2] sdb3[1]
292837184 blocks super 1.2 [2/2] [UU]
[>....................] check = 0.0% (4608/292837184) finish=1055.4min speed=4608K/sec
Wait for the synchronization to stop.
b) No RAID
Note: This operation is destructive.
If you don’t have a backup of the file there is nothing you can do except letting that be a lesson on why you must take backups.
_$: dd if=/dev/zero of=/dev/sda3 bs=4096 count=1 seek=1045723
_$: sync
6) Run the tests again and see the error is gone
_$: sudo smartctl -t long /dev/sda ; smartctl -l selftest /dev/sda
================================================================================
sda (S1D5T7QG)
================================================================================
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.13.0-40-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 3725 -
# 2 Short offline Completed without error 00% 3723 -
You can put your superhero cape in the wardrobe now.