XFS is the default filesystem on RHEL, Rocky, AlmaLinux, and many large-scale storage workloads, and one of the most reliable filesystems Linux ships. But "reliable" does not mean "self-monitoring" โ you still need to watch fragmentation, inode usage, and the kernel logs for the unmistakable signs of upcoming trouble. This guide covers the XFS-specific health commands, the metrics worth alerting on, and the recovery workflow when something does go wrong.
Confirm what you have
df -hT # filesystem types per mount
findmnt -t xfs # all XFS mounts
sudo xfs_info /var # block size, sectsz, inode size, log size
sudo xfs_db -r -c 'sb 0' -c 'p' /dev/sda1 | head # superblock fields
Healthy xfs_info output shows the volume size in blocks, the inode size (commonly 512 bytes), the log location and size, and whether reflink and crc are enabled. crc=1 is the modern default โ every block is checksummed, so silent corruption is detected.
Free space and inodes
XFS allocates inodes dynamically โ you cannot exhaust them as easily as on ext4 โ but it is still possible:
df -h /var
df -i /var
sudo xfs_quota -x -c 'free -h' /var
sudo xfs_quota -x -c 'report -h' /var # if quotas enabled
For "no space left on device" with disk free, check inodes; for "filesystem is full" with inodes free, check space. Both can happen on XFS.
Fragmentation reporting
sudo xfs_db -c 'frag -f' -r /dev/sda1
sudo xfs_db -c 'frag' -r /dev/sda1
sudo xfs_bmap -v /var/lib/postgresql/main/base/16384/12345 | head
xfs_db frag -f reports actual fragmentation factor (the higher, the worse). Healthy: under 5%; concerning: above 30%; bad: above 50%. Note that XFS is generally less prone to fragmentation than ext4 on aged filesystems, so high values usually indicate either a write-heavy workload or aggressive snapshotting.
Defragmentation
sudo xfs_fsr # online filesystem reorganiser
sudo xfs_fsr -v -t 600 /var # 10-minute time-boxed run
sudo xfs_fsr -v /var/lib/postgresql/main/base/16384/12345 # one file
xfs_fsr runs while the filesystem is mounted and writeable. By default it tries every file under /etc/mtab; -t caps the runtime so it can be scheduled in maintenance windows.
Detecting filesystem errors
sudo dmesg -T | grep -iE 'XFS .*error|XFS .*shutdown|XFS .*corrupt'
sudo journalctl -k -p err -b | grep -i xfs
findmnt /var -no OPTIONS # look for 'ro,' or 'shutdown'
XFS shuts down a filesystem on unrecoverable corruption โ subsequent writes return I/O errors. The kernel log is the primary signal; alert on any line containing XFS .*shutdown.
Online consistency check
XFS supports an online scrub (kernel 4.15+ with experimental support; production-stable on RHEL 8.4+):
sudo xfs_scrub /var # online check
sudo xfs_scrub_all # all mounted XFS filesystems
sudo systemctl enable --now xfs_scrub_all.timer # weekly scheduled scrub
Read-only scrub does not interrupt operations and emits warnings for problems detected. For repair, the filesystem must be unmounted.
Repair procedure
If xfs_scrub reports problems or the kernel shut the filesystem down:
sudo umount /var
sudo xfs_repair -n /dev/sda1 # dry-run, reports what would be done
sudo xfs_repair /dev/sda1 # actual repair
sudo mount /var
sudo dmesg -T | tail
Always run xfs_repair -n first. Never run xfs_repair on a mounted filesystem โ the result is data loss. If the log is corrupted: xfs_repair -L (zeroes the log; potentially loses recent writes; use only as a last resort).
Backup before repair
For irreplaceable data:
sudo xfs_metadump /dev/sda1 /backup/sda1.metadump # metadata-only snapshot
sudo dd if=/dev/sda1 of=/backup/sda1.img bs=1M status=progress # full image (large)
The metadump is small (megabytes vs terabytes), captures the filesystem structure without file contents, and lets upstream xfs developers reproduce corruption you report.
Performance tuning
# /etc/fstab options that often help
UUID=... /var xfs defaults,noatime,nodiratime,inode64,allocsize=16m 0 2
sudo mount -o remount,noatime /var
sudo xfs_io -c 'extsize 16m' /var/lib/big-files
inode64 places inodes anywhere in the filesystem (default on modern XFS); allocsize=16m reduces fragmentation by allocating in 16 MB chunks for streaming writes.
Quotas
sudo mount -o remount,uquota,gquota,pquota /var
sudo xfs_quota -x -c 'limit bsoft=4g bhard=5g user1' /var
sudo xfs_quota -x -c 'report -h' /var
XFS quotas are first-class โ separate user, group, and project quota namespaces, all manageable per directory tree.
The audit script
#!/bin/bash
for fs in $(findmnt -t xfs -no SOURCE,TARGET); do
src=${fs%% *}; tgt=${fs#* }
echo "== $tgt ($src) =="
df -h "$tgt" | tail -1
df -i "$tgt" | tail -1
ro=$(findmnt -no OPTIONS "$tgt" | grep -c '^ro,\|,ro,\|,ro$')
[ "$ro" -gt 0 ] && echo " WARN: read-only"
err=$(dmesg -T 2>/dev/null | grep -c "XFS.*$src.*error")
[ "$err" -gt 0 ] && echo " WARN: $err recent error lines"
done
echo
echo "== Recent XFS shutdown events =="
sudo dmesg -T | grep -i 'XFS.*shutdown' | tail
Common pitfalls
- Running
xfs_repairon a mounted filesystem โ corrupts data. Always unmount first. - Using
xfs_repair -Las a routine fix; it discards the log and may lose seconds of writes. - Forgetting that XFS cannot be shrunk; growing is supported via
xfs_growfs, shrinking requires recreate-and-restore. - Trusting
df -ion dynamic-allocation filesystems; XFS reports current inode count, not maximum.
XFS earns its reputation as quiet and dependable, but quiet does not mean unattended. Schedule a weekly xfs_scrub, alert on shutdown events in dmesg, and keep an xfs_metadump handy for the day you need to talk to the kernel mailing list. Five minutes per host per week prevents the rare-but-catastrophic XFS incident.