How to Monitor Linux RAID Arrays: Complete mdadm

RAID (Redundant Array of Independent Disks) provides data redundancy and performance improvements for Linux servers. However, RAID arrays require active monitoring to detect degraded arrays, failed disks, and rebuild operations before they cause data loss. In this comprehensive guide, you will learn how to monitor RAID health using mdadm, detect problems early, and automate array health checks.

Understanding Linux Software RAID with mdadm

Linux software RAID is managed through mdadm (Multiple Device Administration), which creates and manages RAID arrays using standard block devices. Unlike hardware RAID controllers, software RAID gives you full visibility into array status through /proc/mdstat and mdadm commands.

The most common RAID levels on Linux servers are RAID 1 (mirroring), RAID 5 (striping with parity), RAID 6 (double parity), and RAID 10 (mirrored stripes). Each level has different failure tolerance and performance characteristics that affect your monitoring strategy.

Checking RAID Status with /proc/mdstat

The quickest way to check RAID health is reading /proc/mdstat:

cat /proc/mdstat
# Output shows all arrays, their state, and member disks
# [UU] means all disks up, [U_] means one disk down

The bracket notation is crucial: each U represents an active disk, and an underscore (_) represents a missing or failed disk. A healthy RAID 1 shows [UU], while a degraded one shows [U_] or [_U].

Detailed Array Information with mdadm --detail

For comprehensive array information, use mdadm --detail:

sudo mdadm --detail /dev/md0
# Shows: RAID level, array size, used devices, state
# Active Devices, Working Devices, Failed Devices, Spare Devices

Key fields to monitor include the State (clean, degraded, rebuilding), Active/Failed/Spare device counts, and the UUID for array identification across reboots.

Monitoring Rebuild Progress

When a disk is replaced in a degraded array, RAID rebuilds automatically. Monitor progress through /proc/mdstat:

watch cat /proc/mdstat
# Shows: recovery = XX.X% (YY/ZZ) finish=Xmin speed=XK/sec

Rebuild time depends on array size and I/O load. For production servers, consider setting rebuild speed limits to avoid performance impact:

echo 50000 > /proc/sys/dev/raid/speed_limit_min
echo 200000 > /proc/sys/dev/raid/speed_limit_max

Automating RAID Monitoring

Set up automatic monitoring with mdadm daemon mode and email alerts:

# /etc/mdadm/mdadm.conf
MAILADDR admin@example.com
# Start monitoring daemon
sudo mdadm --monitor --scan --daemonise

For more sophisticated monitoring, our dargslan-raid-monitor tool provides comprehensive RAID health checks with JSON output for integration with monitoring stacks:

pip install dargslan-raid-monitor
dargslan-raid report    # Full health report
dargslan-raid audit     # Issues only
dargslan-raid json      # JSON for automation

Handling Failed Disks

When a disk fails, the procedure is: mark as failed, remove from array, physically replace, add new disk:

sudo mdadm /dev/md0 --fail /dev/sdb1
sudo mdadm /dev/md0 --remove /dev/sdb1
# Replace physical disk
sudo mdadm /dev/md0 --add /dev/sdb1

Always verify the rebuild completes successfully and check SMART data on the replacement disk.

Best Practices for RAID Monitoring

Check /proc/mdstat in your daily monitoring routine
Configure email alerts for degraded arrays
Monitor disk SMART data for early failure prediction
Keep spare disks ready for quick replacement
Test rebuild procedures regularly in non-production environments
Document your RAID layout and disk serial numbers

Download our free RAID Array Monitoring Cheat Sheet for a quick reference of all essential mdadm commands. For deeper Linux storage administration knowledge, check out our Linux & DevOps eBooks.

Categories

How to Monitor Linux RAID Arrays: Complete mdadm Health Check Guide

Understanding Linux Software RAID with mdadm

Checking RAID Status with /proc/mdstat

Detailed Array Information with mdadm --detail

Monitoring Rebuild Progress

Automating RAID Monitoring

Handling Failed Disks

Best Practices for RAID Monitoring

Dargslan Editorial Team (Dargslan)

Stay Updated

Categories

Understanding Linux Software RAID with mdadm

Checking RAID Status with /proc/mdstat

Detailed Array Information with mdadm --detail

Monitoring Rebuild Progress

Automating RAID Monitoring

Handling Failed Disks

Best Practices for RAID Monitoring

Dargslan Editorial Team (Dargslan)

Related Articles

Sudoers Security Audit: Finding NOPASSWD Entries and Privilege Escalation Risks

Linux Audit Log Analysis: Detecting Security Events with auditd

Environment Variable Security: Protecting Secrets in Linux and DevOps

Stay Updated