🎁 New User? Get 20% off your first purchase with code NEWUSER20 Β· ⚑ Instant download Β· πŸ”’ Secure checkout Register Now β†’
Menu

Categories

Linux Disk I/O Monitoring: IOPS, Throughput, and Latency Analysis

Linux Disk I/O Monitoring: IOPS, Throughput, and Latency Analysis

Disk I/O is the single most common cause of "the database is slow" tickets that turn out not to be the database at all. Three numbers describe every storage device β€” IOPS, throughput, and latency β€” and the relationship between them is non-linear in ways that break intuition. This guide walks through the Linux tools that measure each, the per-device counters in /proc, and the patterns that distinguish a saturated SSD from a misbehaving controller.

The three metrics that matter

  • IOPS β€” operations per second. Limited by device queue depth and seek behaviour. Spinning disks: 80–200 IOPS. SATA SSD: 10k–100k. NVMe: 100k–1M+.
  • Throughput β€” MB/s. Limited by interface bandwidth (SATA: 600 MB/s, PCIe Gen4 x4: 8 GB/s).
  • Latency β€” time per operation. The metric users feel. Healthy: under 1 ms for SSD; under 10 ms for HDD; database servers want consistent low p99.

A workload doing 1k IOPS at 4 KB averages 4 MB/s β€” almost no throughput, but enough random-access load to saturate a slow SSD.

iostat: the workhorse

iostat -xmz 5 6                # extended stats, 5-sec samples, MB/s, skip idle
iostat -xmz 5 6 sda nvme0n1    # specific devices
iostat -xt 1                   # add timestamps
iostat -xmd 5                  # device only, no CPU section

Key columns:

  • r/s, w/s β€” read and write IOPS.
  • rMB/s, wMB/s β€” throughput.
  • rareq-sz, wareq-sz β€” average request size.
  • aqu-sz β€” average queue depth. > 1 means requests are queueing.
  • r_await, w_await β€” average wait time per op (ms). The latency users feel.
  • %util β€” percent of time the device had at least one outstanding request. Misleading on multi-queue NVMe; trust latency instead.

iotop: which process is to blame

sudo iotop -oPa                # only active processes, accumulated, P=processes only
sudo iotop -obtqqq --iter=10   # batch mode, 10 iterations, no headers
sudo pidstat -d 5              # per-process I/O

iotop reads /proc/*/io which the kernel updates per process. The oPa flags reduce noise to "processes that actually did I/O, with running totals." For a noisy server, redirect to a file and review later.

Reading /proc/diskstats directly

For scripting and exporters:

cat /proc/diskstats | awk '$3 !~ /loop|ram/'
column -t /proc/diskstats | head

The 14 numeric fields per device include: reads completed, sectors read, time spent reading (ms), writes completed, sectors written, time spent writing, IOs in progress, time IO in progress, weighted time. Sample twice and divide by interval to compute rates. node_exporter and most monitoring agents do exactly this.

Latency distribution with bcc/bpftrace

iostat reports averages; latency outliers cause pain. Use eBPF tools:

sudo apt install bpfcc-tools          # Debian/Ubuntu
sudo biolatency 5 6                   # latency histogram per 5 sec
sudo biotop 5                         # top processes by I/O
sudo biosnoop                          # per-IO trace

biolatency output is a power-of-two histogram showing what fraction of operations completed in 0–1 ms, 1–2 ms, 2–4 ms, etc. A bimodal distribution (most in 0.1 ms, a long tail at 100 ms) usually means a misbehaving controller or filesystem flush stalls.

Synthetic benchmarking with fio

Before you put a database on a new disk, characterise it:

sudo fio --name=randread --filename=/dev/nvme0n1 --rw=randread \
         --bs=4k --iodepth=32 --numjobs=4 --runtime=60 --time_based \
         --group_reporting

sudo fio --name=seqwrite --filename=test.fio --rw=write --bs=1M \
         --iodepth=4 --numjobs=1 --size=1G --runtime=30 --time_based

Run different patterns: random 4 KB read (matches database OLTP), random 4 KB write (write-heavy DB), sequential 1 MB write (backup). Compare to the vendor spec β€” large gaps mean a tuning problem (e.g. RAID write-back disabled, BBU expired).

Distinguishing read vs write saturation

Different cures for different bottlenecks:

  • Read saturated β€” your working set exceeds the page cache. Add RAM, or move the dataset to faster storage.
  • Write saturated β€” fsync latency spikes. Check write-back cache, use a separate WAL/journal device, batch commits.
  • Both saturated β€” the device is genuinely undersized; rightsize the storage tier.

The 20-line monitoring script

#!/bin/bash
THRESH_AWAIT=10
iostat -xmd 5 2 | tail -n +4 | awk -v t=$THRESH_AWAIT '
  NF > 10 && $1 !~ /^(loop|ram|sr|Device|$)/ {
    name=$1
    rIOPS=$2; wIOPS=$3; rMB=$4; wMB=$5
    rAwait=$10; wAwait=$11; util=$NF
    if (rAwait+0 > t || wAwait+0 > t)
      printf "WARN %-10s r=%s w=%s rMB=%s wMB=%s r_await=%s w_await=%s util=%s\n",
             name, rIOPS, wIOPS, rMB, wMB, rAwait, wAwait, util
  }'

Filesystem-level effects

The same disk can show very different latency depending on filesystem:

  • Mount with noatime on read-heavy filesystems to eliminate metadata writes per read.
  • Use data=writeback on ext4 only when you can lose a few seconds of data on crash.
  • Avoid sync mount option in production; it serialises every write through fsync.
  • Tune the I/O scheduler: none for NVMe (let hardware queue), mq-deadline for SATA SSD, bfq for desktops.

Common pitfalls

  • Trusting %util on multi-queue NVMe; modern devices saturate one queue while others are idle, but report 100% util.
  • Benchmarking with the OS page cache enabled and concluding the disk is fast β€” use --direct=1 in fio.
  • Running iostat for one second; rates are unreliable on the first sample. Always discard the first iteration.
  • Forgetting that LVM and dm-crypt add their own block devices; iostat shows latency at every layer.

Disk I/O monitoring is the most quantitative observability you have on a Linux host β€” and the most often misread. Keep iostat in muscle memory, baseline with fio, alert on latency rather than utilisation, and use bpftrace when an average hides a long tail.

Share this article:
Dargslan Editorial Team (Dargslan)
About the Author

Dargslan Editorial Team (Dargslan)

Collective of Software Developers, System Administrators, DevOps Engineers, and IT Authors

Dargslan is an independent technology publishing collective formed by experienced software developers, system administrators, and IT specialists.

The Dargslan editorial team works collaboratively to create practical, hands-on technology books focused on real-world use cases. Each publication is developed, reviewed, and...

Programming Languages Linux Administration Web Development Cybersecurity Networking

Stay Updated

Subscribe to our newsletter for the latest tutorials, tips, and exclusive offers.