Network interfaces fail in subtle ways: a single dropped packet per million, a half-megabit of CRC errors, an interface that auto-negotiated to 100 Mbit instead of 1 Gbit. None of these trip the "interface down" alert, but all of them degrade application latency and throughput. This guide walks through the per-interface counters and link-state checks that catch hardware and configuration problems early.
The single command that replaces ifconfig
ip is the modern Linux network-tooling entry point; ifconfig is deprecated and can lie about VLANs and multi-address interfaces:
ip -s link show # all interfaces with stats
ip -c addr # color-highlighted addresses
ip -j link show eth0 | jq # JSON for scripting
ip route get 8.8.8.8 # which interface egresses?
Link state and negotiation
An interface that comes up at the wrong speed silently halves your throughput. Use ethtool:
sudo ethtool eth0 | grep -E 'Speed|Duplex|Link detected|Auto-negotiation'
sudo ethtool -i eth0 # driver, firmware version, bus
sudo ethtool -S eth0 | grep -i error # per-driver detailed counters
Sample healthy output: Speed: 1000Mb/s, Duplex: Full, Link detected: yes. If you see Speed: 100Mb/s on a 1 Gbit port, suspect cable, switch port config, or auto-neg mismatch.
Error counters from /proc and /sys
Every interface exposes per-statistic files in sysfs:
ls /sys/class/net/eth0/statistics/
cat /sys/class/net/eth0/statistics/rx_errors
cat /sys/class/net/eth0/statistics/tx_dropped
cat /sys/class/net/eth0/statistics/rx_crc_errors
Healthy interfaces have these counters at zero or growing very slowly. A non-zero CRC error count almost always means a hardware-layer problem (cable, transceiver, port).
Real-time traffic monitoring
sar -n DEV 5 6 # bytes/packets per interface, 6 samples
ifstat -i eth0 1 # human readable, one second granularity
nload eth0 # ncurses dashboard
sudo iftop -i eth0 # top-talkers in real time
For multi-host fleet view, pipe sar output into your time-series database and graph rxkB/s, txkB/s, rxerr/s, txerr/s per interface.
Detecting silent packet loss
Some loss never triggers protocol error counters. Use ping at high frequency to detect it:
ping -c 1000 -i 0.01 -q 192.0.2.1 # 1000 packets, 100/sec
sudo mtr -rwbzc 100 8.8.8.8 # report mode, 100 packets per hop
ss -ti # per-socket retransmit/loss stats
Acceptable loss to a server one hop away: 0%. To a server one continent away: under 1%. Any loss within a single data center should trigger an investigation.
NIC offload settings
Modern NICs handle TSO, GRO, and checksum offload in hardware. Misconfigured offloads can drop packets:
sudo ethtool -k eth0 | head -20
sudo ethtool -K eth0 tso off gso off # disable to debug
sudo ethtool -G eth0 # ring-buffer sizes
sudo ethtool -G eth0 rx 4096 tx 4096 # raise to maximum supported
Increase ring buffers if rx_dropped grows under bursty traffic. Disable offloads only if a packet capture shows obviously malformed frames; otherwise keep them on.
Bonding and team interfaces
cat /proc/net/bonding/bond0 # member status, MII
ip link show master bond0 # member interfaces
sudo ethtool -i bond0
journalctl -k | grep -E 'bond|link'
Active-backup bonds switch instantly on member failure, but only if the link-monitoring (miimon or ARP probes) is configured. miimon=100 in /etc/network/interfaces or NetworkManager profile is the minimum.
Per-CPU NIC interrupt distribution
One CPU pinned at 100% under network load usually means all NIC interrupts land on a single core. Check:
cat /proc/interrupts | grep eth0
sudo ethtool -L eth0 combined 8 # set channels
sudo ethtool -X eth0 equal 8 # spread RSS across queues
Pair with the irqbalance daemon or pin manually via /proc/irq/N/smp_affinity. Verify load distribution with mpstat -P ALL 5.
One-shot health script
Build a 20-line shell script that prints a per-interface summary every minute:
for i in $(ls /sys/class/net | grep -v lo); do
ops=$(cat /sys/class/net/$i/operstate)
err=$(cat /sys/class/net/$i/statistics/rx_errors)
drp=$(cat /sys/class/net/$i/statistics/rx_dropped)
spd=$(cat /sys/class/net/$i/speed 2>/dev/null || echo 0)
printf "%-10s state=%-4s speed=%-5s errors=%-6d dropped=%-6d\n" \
"$i" "$ops" "$spd" "$err" "$drp"
done
Push this into a Prometheus textfile collector or your log shipper for fleet-wide monitoring.
Common pitfalls
- Running
ifconfigon a server with VLAN sub-interfaces; the output omits them. - Watching only
operstateand missing CRC errors that indicate failing optics. - Ignoring
tx_carrier_errorsgrowth โ usually a flapping switch port. - Setting
autoneg offon one side of a link and forgetting the other side; result is a half-duplex link with very poor throughput.
Network interfaces are the silent failure plane of every distributed system. The five commands in this guide, run as a scheduled health check, surface 90% of NIC, cable, and switch problems before users feel them.