Monitoring your Linux servers is not optional - it is the difference between knowing about a problem before users do and scrambling to fix a crisis. The Linux ecosystem offers an impressive range of monitoring tools, from simple command-line utilities to enterprise-grade platforms. This guide compares the best options and helps you choose the right stack for your needs.
Monitoring Categories
- CLI tools - Quick terminal-based monitoring (htop, glances, iotop)
- Metrics collection - Prometheus, Telegraf, collectd
- Visualization - Grafana, Kibana
- Full platforms - Zabbix, Nagios, Netdata
- Log management - ELK Stack, Loki, Graylog
1. Command-Line Monitoring Tools
Every Linux admin should know these CLI tools. They are available on virtually every system and provide instant visibility into system health.
# htop - Interactive process viewer (best replacement for top)
sudo apt install htop
htop
# Features: color-coded CPU/memory bars, sort by any column,
# search processes, tree view, kill processes directly
# glances - Complete system overview dashboard
sudo apt install glances
glances
# Shows: CPU, memory, disk I/O, network, processes, sensors, Docker containers
# iotop - Disk I/O monitoring per process
sudo apt install iotop
sudo iotop
# Shows which processes are reading/writing to disk
# nethogs - Network bandwidth per process
sudo apt install nethogs
sudo nethogs eth0
# Shows which processes are using network bandwidth
# dstat - Versatile resource statistics tool
sudo apt install dstat
dstat -cdngy 5 # CPU, disk, network, paging, system every 5 seconds
# sar - System Activity Reporter (historical data!)
sudo apt install sysstat
sudo systemctl enable sysstat
sar -u 5 3 # CPU usage: 3 reports, 5 seconds apart
sar -r 5 3 # Memory usage
sar -d 5 3 # Disk activity
sar -n DEV 5 3 # Network statistics
# Quick one-liners for common checks
uptime # Load averages
free -h # Memory usage
df -h # Disk space
ss -tuln # Listening ports
vmstat 5 # Virtual memory statistics
2. Prometheus + Grafana (Industry Standard Stack)
Best for: Cloud-native environments, Kubernetes, microservices, DevOps teams
Prometheus is a pull-based metrics collection and alerting system. Grafana provides beautiful dashboards for visualization. Together, they are the de facto standard for modern infrastructure monitoring.
Architecture
- Prometheus Server - Scrapes metrics from targets at configured intervals, stores time-series data
- Exporters - Agents that expose metrics in Prometheus format (Node Exporter for servers, various app exporters)
- AlertManager - Handles alerts from Prometheus, routes to email, Slack, PagerDuty
- Grafana - Visualization layer, creates dashboards from Prometheus data
# Install Node Exporter on every server you want to monitor
# Exposes system metrics: CPU, memory, disk, network
wget https://github.com/prometheus/node_exporter/releases/latest/download/node_exporter-linux-amd64.tar.gz
tar xvf node_exporter-*.tar.gz
./node_exporter-*/node_exporter
# Metrics available at http://server:9100/metrics
# Key Prometheus queries (PromQL):
# CPU usage percentage across all cores
100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
# Memory usage percentage
(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100
# Disk usage percentage
100 - (node_filesystem_avail_bytes / node_filesystem_size_bytes * 100)
# Network traffic rate (bytes per second)
rate(node_network_receive_bytes_total[5m])
# Install Grafana
sudo apt install -y apt-transport-https software-properties-common
sudo apt install grafana
sudo systemctl enable --now grafana-server
# Access at http://server:3000 (default login: admin/admin)
Why Choose Prometheus + Grafana?
- Industry standard, massive community and ecosystem
- Excellent Kubernetes and Docker integration
- Thousands of pre-built Grafana dashboards
- Powerful query language (PromQL)
- Scales to millions of time series
- Free and open source
3. Zabbix (Enterprise Monitoring Platform)
Best for: Large infrastructure, traditional IT environments, compliance requirements, SNMP network devices
Zabbix is a mature, full-featured enterprise monitoring platform with agent-based and agentless monitoring, auto-discovery, sophisticated alerting with escalation, and built-in visualization.
Key Features
- Agent-based AND agentless monitoring (SNMP, IPMI, JMX)
- Automatic host and service discovery
- Template-based configuration (thousands of community templates)
- Sophisticated alerting with escalation chains
- Network device monitoring via SNMP
- Built-in web-based dashboard
- Distributed monitoring with Zabbix Proxies
# Install Zabbix server on Ubuntu 22.04+
wget https://repo.zabbix.com/zabbix/7.0/ubuntu/pool/main/z/zabbix-release/zabbix-release_latest_all.deb
sudo dpkg -i zabbix-release_latest_all.deb
sudo apt update
sudo apt install zabbix-server-pgsql zabbix-frontend-php zabbix-apache-conf zabbix-agent2
# Install Zabbix Agent on monitored servers
sudo apt install zabbix-agent2
sudo systemctl enable --now zabbix-agent2
# Configure agent to report to Zabbix server
sudo nano /etc/zabbix/zabbix_agent2.conf
# Set: Server=zabbix-server-ip
# Set: ServerActive=zabbix-server-ip
# Set: Hostname=myserver01
4. Netdata (Real-Time Monitoring)
Best for: Real-time visibility, single servers, quick setup, home labs, developers
Netdata provides stunning real-time dashboards with 1-second granularity and zero configuration. It is the fastest way to get comprehensive monitoring running.
# Install Netdata with a single command
bash <(curl -SsL https://my-netdata.io/kickstart.sh)
# That is it! Access dashboards at http://server:19999
# What Netdata monitors automatically (zero config):
# - CPU per core, processes, interrupts
# - Memory, swap, page faults
# - Disk I/O per device
# - Network traffic per interface
# - System services (systemd)
# - Docker containers
# - Web servers (Nginx, Apache)
# - Databases (PostgreSQL, MySQL, Redis)
# - 2000+ integrations out of the box
# Resource usage: approximately 5% CPU, 100MB RAM
# Includes built-in anomaly detection with machine learning
5. Nagios (The Original)
Best for: Legacy environments, organizations already using it, compliance requirements
Nagios has been the standard since 1999 with an enormous plugin ecosystem. However, for new deployments, consider modern alternatives like Icinga2 (a Nagios fork with better APIs) or Checkmk.
6. Loki + Grafana (Log Monitoring)
Best for: Complementing Prometheus with log aggregation
While Prometheus handles metrics, Loki handles logs. Together with Grafana, you get a complete observability stack. Loki is designed to be cost-effective by indexing only labels, not full log content.
Comparison Matrix
| Tool | Setup | Scale | Best For | Cost |
|---|---|---|---|---|
| Prometheus+Grafana | Medium | Excellent | Cloud-native, K8s | Free |
| Zabbix | Complex | Excellent | Enterprise, SNMP | Free |
| Netdata | Very Easy | Good | Real-time, single servers | Free/Paid |
| Nagios/Icinga | Complex | Good | Legacy, compliance | Free |
| htop/glances | Instant | Single host | Quick debugging | Free |
Recommended Stack by Use Case
- Solo developer / 1-5 servers: Netdata (one-command install, zero config, beautiful UI)
- Growing startup / cloud infrastructure: Prometheus + Grafana (industry standard, scales well)
- Enterprise / 100+ servers: Zabbix OR Prometheus + Grafana + Loki + AlertManager
- Kubernetes-heavy: Prometheus + Grafana (native K8s integration)
- Home lab: Netdata for real-time + sar for historical data
Essential Metrics to Monitor
| Category | Metrics | Alert Threshold |
|---|---|---|
| CPU | Usage %, load average, steal time | > 80% sustained |
| Memory | Used %, swap usage, OOM events | > 85% used |
| Disk | Usage %, I/O wait, inode usage | > 80% space |
| Network | Bandwidth, errors, dropped packets | Error rate > 0.1% |
| Services | HTTP status, response time, uptime | Response > 2s |
Recommended Reading
Master Linux server administration and performance:
Download our Linux Monitoring Tools Cheat Sheet for a printable comparison reference with all setup commands and feature comparisons.