Hard disk usage

_$: du /etc
_$: du -sh /etc

Hard disk(s) size

_$: dmesg | grep blocks
_$: fdisk -l /dev/sd? | grep "Disk"
_$: for disk in /dev/sd? ; do fdisk -l $disk | grep "Disk $disk" ; done

Hard disk structure

_$: df -Th
_$: fdisk -l

Files with bigger sizes

_$: du -sBM * | sort -nr
_$: du -sBM .* | sort -nr

_$: du -ckx | sort -nr | head

Ordered by increasing size (Top 10):

_$: find . -type f -print0 | xargs -0 du | sort -n  | tail -10 | cut -f2 | xargs -I{} du -sh {}

Ordered by decreasing size (Top 10):

_$: find . -type f -print0 | xargs -0 du | sort -nr | head -10 | cut -f2 | xargs -I{} du -sh {}

Without changing filesystem:

_$: find / -mount -type f | xargs du | sort -nr -k 1 | head -n 10

Installed packages with bigger sizes

_$: dpkg-query -Wf '${Installed-Size}\t${Package}\n' | sort -rn | less

Show partition table

_$: fdisk -l /dev/sda

Copy partition table

From: /dev/sda To: /dev/sdb

_$: sfdisk -d /dev/sda | sfdisk /dev/sdb

Hard disk information

_$: smartctl -i /dev/sda

Check data blocks

_$: badblocks /dev/sda1

If the hard disk is external:

_$: cat /etc/mtab
...
/dev/sdc1 /media/<user>/Seagate\040Backup\040Plus\040Drive fuseblk rw,nosuid,nodev,allow_other,default_permissions,blksize=4096 0 0

_$: badblocks /dev/sdc1

Hard disk test

Do the test

_$: sudo smartctl -t long  /dev/sda         # Long test (2 hours)
_$: sudo smartctl -t short /dev/sda         # Short test (1 minute)
_$: sudo smartctl -t conveyance /dev/sda    # Transport test (2 minutes)

See the progress of the test

_$: smartctl -a /dev/sda | grep -A 1 "Self-test execution status"

See the results of the test

_$: smartctl -l selftest /dev/sda           # Test results
_$: smartctl -a /dev/sda                    # All information about the disk

Get hard disk’s UUID

_$: ls -l /dev/disk/by-uuid  # Degraded system
total 0
lrwxrwxrwx 1 root root 10 Jul 28 12:29 1c854bb4-77ec-4b49-b187-261bd55ec412 -> ../../dm-1
lrwxrwxrwx 1 root root 10 Jul 28 12:29 9058b8da-6e47-49d1-8238-5835dd5efc7a -> ../../dm-2
lrwxrwxrwx 1 root root  9 Jul 28 12:30 a17a3729-243f-46fe-bb6f-304be50f8797 -> ../../md0
lrwxrwxrwx 1 root root 10 Jul 28 12:30 aa3a9cab-1314-450b-8bd9-f01a3e884912 -> ../../dm-0
lrwxrwxrwx 1 root root 10 Jul 28 12:29 ab1a10d0-d87d-42d8-add5-8a767a012410 -> ../../sda2
lrwxrwxrwx 1 root root  9 Jul 28 12:29 f63823e5-9da9-45cf-82a6-a9bd7cd893de -> ../../md1
_$: ls -l /dev/disk/by-uuid  # Normal system
total 0
lrwxrwxrwx 1 root root  9 jul 28 08:37 0c95c63d-8e13-4a4e-bf03-85395878d97a -> ../../md0
lrwxrwxrwx 1 root root  9 jul 28 08:37 195d4e0b-45ae-4a39-8fc8-a5c5492ae5f4 -> ../../md1
lrwxrwxrwx 1 root root  9 jul 28 08:37 ca3007a3-fdd6-4a25-94b0-b9cf6e4e84aa -> ../../md2
lrwxrwxrwx 1 root root 10 jul 28 08:37 d02e46fb-ff0d-4e09-9873-d4d6113e59e3 -> ../../sda2
lrwxrwxrwx 1 root root  9 jul 28 08:37 d6a70065-89da-4c8d-b4c3-14ef878107e4 -> ../../md3
_$: blkid /dev/sda1
/dev/sda1: UUID="2f7d1305-f7a0-9dc7-07f1-f4795f3e773f" UUID_SUB="799103a8-88b0-9681-9dcf-dab9c391ae3c" LABEL="dns:0" TYPE="linux_raid_member"

Get hard disk’s serial

_$: lshw -C disk
  *-disk
       description: ATA Disk
       product: ST1000DM003-1CH1
       vendor: Seagate
       physical id: 0.0.0
       bus info: scsi@0:0.0.0
       logical name: /dev/sda
       version: CC49
       serial: Z1D9XYBD     <===
       size: 931GiB (1TB)
       capabilities: partitioned partitioned:dos
       configuration: ansiversion=5
  *-disk
       description: ATA Disk
       product: ST1000DM003-1CH1
       vendor: Seagate
       physical id: 0.0.0
       bus info: scsi@1:0.0.0
       logical name: /dev/sdb
       version: CC49
       serial: Z1D9C0LD     <===
       size: 931GiB (1TB)
       capabilities: partitioned partitioned:dos
       configuration: ansiversion=5 signature=0004295a
  *-cdrom
       description: DVD-RAM writer
       physical id: 0.0.0
       bus info: scsi@2:0.0.0
       logical name: /dev/cdrom
       logical name: /dev/cdrw
       logical name: /dev/dvd
       logical name: /dev/dvdrw
       logical name: /dev/sr0
       capabilities: audio cd-r cd-rw dvd dvd-r dvd-ram
       configuration: status=nodisc

That serial must appear in the hard disk’s case, so that we know which one has failed.

If you have smartmontools installed, a faster way is this one:

_$: sudo smartctl -a /dev/sdb | grep "Serial Number"
Serial Number:    Z1D9C0LD

Bad blocks

_$: smartctl -l selftest /dev/sda
================================================================================
sda (S1D5T7QG)
================================================================================
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.13.0-40-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%      3694         47429338

The LBA with the first error is 47429338

1) Find out the partition for said LBA

_$: sudo fdisk -lu /dev/sda

Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x0001345b

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048     7813119     3905536   fd  Linux raid autodetect
/dev/sda2         7813120    39063551    15625216   82  Linux swap / Solaris
/dev/sda3        39063552   625000447   292968448   fd  Linux raid autodetect
/dev/sda4       625002494  1953523711   664260609    5  Extended
Partition 4 does not start on physical sector boundary.
/dev/sda5       625002496   820312063    97654784   fd  Linux raid autodetect
/dev/sda6       820314112  1953523711   566604800   fd  Linux raid autodetect

It looks like it is going to be partition sda3 but we check it nonetheless:

_$: python -c 'print 39063552 < 47429338 < 625000447'   # Block_start < LBA_of_first_error < Block_end
True

So the failing sector is in the sda3 partition. What is in that partition?

_$: grep sda3 /etc/fstab
# / was on /dev/sda3 during installation

So / is in the sda3. Where is it mounted?

_$: grep " / " /etc/fstab
# / was on /dev/sda3 during installation
#UUID=c42571f6-beec-4fd5-86b1-2b2ca1684a16 /               ext4    errors=remount-ro 0       1
/dev/md1 /     						ext4	errors=remount-ro	0	1

So sda3 is part of the md1 array and it is an ext4 filesystem. Now choose your own adventure:

  • If you have a RAID system go to step 5.
  • If you do not have a RAID system, continue with step 2.

2) Find out the block size of an ext4 filesystem

_$: tune2fs -l /dev/sda3 | grep Block
Block count:              73242112
Block size:               4096
Blocks per group:         32768

So the block size is 4096 bytes (4K).

3) Find out the filesystem’s block that contains the LBA

b = (int)((L-S)*512/B)

where:
b = File System block number
B = File system block size in bytes
L = LBA of bad sector
S = Starting sector of partition as shown by fdisk -lu
and (int) denotes the integer part.

Let's see:
LBA = 47429338
Block_start = 39063552
Block_end = 625000447       # We will not use this one
Block_size = 4096
_$: python -c "b = int((47429338-39063552)*512/4096) ; print(b)"
1045723

So the block that is failing is 1045723.

4) Find out the inode stored in that block and the file that has said inode

_$: debugfs
debugfs 1.42.9 (4-Feb-2014)
debugfs:  open /dev/sda3
/dev/sda3: Can't read an inode bitmap while reading inode bitmap
debugfs:  open -c /dev/sda3
/dev/sda3: catastrophic mode - not reading inode or group bitmaps
debugfs:  quit

If you have a RAID you can check it this way:

_$: cat /proc/mdstat | grep "sda3" | cut -d' ' -f1
md1
_$: debugfs
debugfs 1.42.9 (4-Feb-2014)
debugfs:  open /dev/md1
debugfs:  testb 1045723
Block 1045723 marked in use
debugfs:  icheck 1045723
Block	Inode number
1045723	3014674
debugfs:  ncheck 3014674
Inode	Pathname
3014674	/home/<user>/isos/local/sqlserver-enu-2008-r2-dvd-eval.iso
debugfs:  quit

5) Overwrite the file

a) RAID

Note: This operation is not destructive.

If we have a RAID-1, we should have a copy of the file in the other hard disk. We will force the RAID to check that all files are alright and to repair those that aren’t.

_$: cat /proc/mdstat | grep "sda3" | cut -d' ' -f1
md1
_$: echo 'check' > /sys/block/md1/md/sync_action
_$: cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
...
md1 : active raid1 sda3[2] sdb3[1]
      292837184 blocks super 1.2 [2/2] [UU]
      [>....................]  check =  0.0% (4608/292837184) finish=1055.4min speed=4608K/sec

Wait for the synchronization to stop.

b) No RAID

Note: This operation is destructive.

If you don’t have a backup of the file there is nothing you can do except letting that be a lesson on why you must take backups.

_$: dd if=/dev/zero of=/dev/sda3 bs=4096 count=1 seek=1045723
_$: sync

6) Run the tests again and see the error is gone

_$: sudo smartctl -t long  /dev/sda ; smartctl -l selftest /dev/sda
================================================================================
sda (S1D5T7QG)
================================================================================
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.13.0-40-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      3725         -
# 2  Short offline       Completed without error       00%      3723         -

You can put your superhero cape in the wardrobe now.