Understanding Disk Health Monitoring
Hard drives and SSDs have finite lifespans. SMART (Self-Monitoring, Analysis, and Reporting Technology) provides early warning signs of impending drive failure, giving administrators time to backup data and replace failing hardware.
SMART Health Checks
sudo smartctl -H /dev/sda
sudo smartctl -A /dev/sda
sudo smartctl -t short /dev/sda
Key SMART Attributes to Monitor
- Reallocated Sector Count - Sectors remapped due to read errors
- Current Pending Sector Count - Unstable sectors awaiting reallocation
- Uncorrectable Error Count - Errors that could not be recovered
- Temperature - Operating temperature affects drive lifespan
- Power-On Hours - Total hours the drive has been powered on
Disk I/O Statistics
iostat -dx 1 5
iotop -o
cat /proc/diskstats
Inode Usage Monitoring
Running out of inodes is as critical as running out of disk space. Many small files can exhaust inodes while disk space appears available.
df -i
find / -xdev -printf "%h\n" | sort | uniq -c | sort -rn | head
Automated Monitoring with dargslan-disk-health
pip install dargslan-disk-health
dargslan-disk-health
dargslan-disk-health --smart
dargslan-disk-health --inodes