🎁 New User? Get 20% off your first purchase with code NEWUSER20 Register Now →
Menu

Categories

How to Monitor Linux RAID Arrays: Complete mdadm Health Check Guide

How to Monitor Linux RAID Arrays: Complete mdadm Health Check Guide

RAID (Redundant Array of Independent Disks) provides data redundancy and performance improvements for Linux servers. However, RAID arrays require active monitoring to detect degraded arrays, failed disks, and rebuild operations before they cause data loss. In this comprehensive guide, you will learn how to monitor RAID health using mdadm, detect problems early, and automate array health checks.

Understanding Linux Software RAID with mdadm

Linux software RAID is managed through mdadm (Multiple Device Administration), which creates and manages RAID arrays using standard block devices. Unlike hardware RAID controllers, software RAID gives you full visibility into array status through /proc/mdstat and mdadm commands.

The most common RAID levels on Linux servers are RAID 1 (mirroring), RAID 5 (striping with parity), RAID 6 (double parity), and RAID 10 (mirrored stripes). Each level has different failure tolerance and performance characteristics that affect your monitoring strategy.

Checking RAID Status with /proc/mdstat

The quickest way to check RAID health is reading /proc/mdstat:

cat /proc/mdstat
# Output shows all arrays, their state, and member disks
# [UU] means all disks up, [U_] means one disk down

The bracket notation is crucial: each U represents an active disk, and an underscore (_) represents a missing or failed disk. A healthy RAID 1 shows [UU], while a degraded one shows [U_] or [_U].

Detailed Array Information with mdadm --detail

For comprehensive array information, use mdadm --detail:

sudo mdadm --detail /dev/md0
# Shows: RAID level, array size, used devices, state
# Active Devices, Working Devices, Failed Devices, Spare Devices

Key fields to monitor include the State (clean, degraded, rebuilding), Active/Failed/Spare device counts, and the UUID for array identification across reboots.

Monitoring Rebuild Progress

When a disk is replaced in a degraded array, RAID rebuilds automatically. Monitor progress through /proc/mdstat:

watch cat /proc/mdstat
# Shows: recovery = XX.X% (YY/ZZ) finish=Xmin speed=XK/sec

Rebuild time depends on array size and I/O load. For production servers, consider setting rebuild speed limits to avoid performance impact:

echo 50000 > /proc/sys/dev/raid/speed_limit_min
echo 200000 > /proc/sys/dev/raid/speed_limit_max

Automating RAID Monitoring

Set up automatic monitoring with mdadm daemon mode and email alerts:

# /etc/mdadm/mdadm.conf
MAILADDR admin@example.com
# Start monitoring daemon
sudo mdadm --monitor --scan --daemonise

For more sophisticated monitoring, our dargslan-raid-monitor tool provides comprehensive RAID health checks with JSON output for integration with monitoring stacks:

pip install dargslan-raid-monitor
dargslan-raid report    # Full health report
dargslan-raid audit     # Issues only
dargslan-raid json      # JSON for automation

Handling Failed Disks

When a disk fails, the procedure is: mark as failed, remove from array, physically replace, add new disk:

sudo mdadm /dev/md0 --fail /dev/sdb1
sudo mdadm /dev/md0 --remove /dev/sdb1
# Replace physical disk
sudo mdadm /dev/md0 --add /dev/sdb1

Always verify the rebuild completes successfully and check SMART data on the replacement disk.

Best Practices for RAID Monitoring

  • Check /proc/mdstat in your daily monitoring routine
  • Configure email alerts for degraded arrays
  • Monitor disk SMART data for early failure prediction
  • Keep spare disks ready for quick replacement
  • Test rebuild procedures regularly in non-production environments
  • Document your RAID layout and disk serial numbers

Download our free RAID Array Monitoring Cheat Sheet for a quick reference of all essential mdadm commands. For deeper Linux storage administration knowledge, check out our Linux & DevOps eBooks.

Share this article:
Dargslan Editorial Team (Dargslan)
About the Author

Dargslan Editorial Team (Dargslan)

Collective of Software Developers, System Administrators, DevOps Engineers, and IT Authors

Dargslan is an independent technology publishing collective formed by experienced software developers, system administrators, and IT specialists.

The Dargslan editorial team works collaboratively to create practical, hands-on technology books focused on real-world use cases. Each publication is developed, reviewed, and...

Programming Languages Linux Administration Web Development Cybersecurity Networking

Stay Updated

Subscribe to our newsletter for the latest tutorials, tips, and exclusive offers.