🎁 New User? Get 20% off your first purchase with code NEWUSER20 Register Now β†’
Menu

Categories

Linux System Uptime Monitoring: Availability Tracking and Crash Analysis

Linux System Uptime Monitoring: Availability Tracking and Crash Analysis

System uptime monitoring is fundamental to maintaining service reliability and meeting SLA commitments. Beyond the simple uptime command, Linux provides rich data about boot history, crash events, OOM kills, and system stability that helps you identify reliability issues before they impact users. This guide covers comprehensive uptime and availability monitoring techniques.

Basic Uptime Information

The uptime command provides a quick overview:

uptime
# 14:23:15 up 45 days, 3:42, 2 users, load average: 0.15, 0.10, 0.08

# Machine-readable from /proc
cat /proc/uptime
# 3891720.45 7654321.89 (uptime_seconds idle_seconds)

For SLA tracking, you need more than current uptime β€” you need historical reboot data and crash analysis.

Reboot History with last

The last command shows reboot and shutdown history:

# Reboot history
last reboot

# Shutdown history
last -x shutdown

# With full timestamps
last reboot -F

Frequent reboots may indicate kernel panics, hardware issues, or OOM kills triggering automatic restarts.

Boot Analysis with systemd-analyze

Systemd provides detailed boot analysis:

# Boot time summary
systemd-analyze

# List all boots with journalctl
journalctl --list-boots

# Check for kernel panics
journalctl -k -p 0 --no-pager

# Find OOM kill events
journalctl -g "Out of memory" --no-pager

Calculating Availability Percentage

SLA availability is typically expressed as a percentage over a 30-day period:

  • 99.9% (three nines) = max 43 minutes downtime/month
  • 99.95% = max 22 minutes downtime/month
  • 99.99% (four nines) = max 4.3 minutes downtime/month

Automate availability tracking with our tool:

pip install dargslan-uptime-report
dargslan-uptime report    # Full availability report
dargslan-uptime reboots   # Reboot history
dargslan-uptime crashes   # Crash event analysis
dargslan-uptime load      # Current load average

Detecting OOM Kills

Out-of-Memory kills are a common cause of service disruption:

# Find OOM events
dmesg | grep -i "out of memory"
journalctl -g "oom-kill" --no-pager

# Check which process was killed
dmesg | grep -i "killed process"

If OOM kills are frequent, you need to either increase RAM, tune the OOM score for critical processes, or fix memory leaks in your application.

Proactive Stability Monitoring

  • Monitor load average vs CPU count (load > 2x CPUs indicates overload)
  • Track reboot frequency β€” more than 2 unplanned reboots/month needs investigation
  • Set up alerting for kernel panics and OOM kills
  • Monitor swap usage trends (increasing swap = approaching OOM)
  • Use watchdog timers for automatic crash recovery

Uptime Monitoring Best Practices

  • Track uptime history, not just current uptime
  • Calculate and report monthly availability percentages
  • Investigate all unplanned reboots within 24 hours
  • Configure OOM score adjustments for critical services
  • Set up external uptime monitoring (the server cannot report its own downtime)
  • Document all planned maintenance windows

Download our free System Uptime & Availability Cheat Sheet for essential monitoring commands. For deeper Linux administration knowledge, explore our Linux & DevOps eBooks.

Share this article:
Dargslan Editorial Team (Dargslan)
About the Author

Dargslan Editorial Team (Dargslan)

Collective of Software Developers, System Administrators, DevOps Engineers, and IT Authors

Dargslan is an independent technology publishing collective formed by experienced software developers, system administrators, and IT specialists.

The Dargslan editorial team works collaboratively to create practical, hands-on technology books focused on real-world use cases. Each publication is developed, reviewed, and...

Programming Languages Linux Administration Web Development Cybersecurity Networking

Stay Updated

Subscribe to our newsletter for the latest tutorials, tips, and exclusive offers.