🎁 New User? Get 20% off your first purchase with code NEWUSER20 Register Now β†’
Menu

Categories

Linux Disaster Recovery: Build a Plan That Actually Works

Linux Disaster Recovery: Build a Plan That Actually Works

Your server just crashed. The database is corrupted. The RAID array is degraded. What do you do?

If you don't have a tested disaster recovery (DR) plan, the answer is: panic. And panic costs money β€” the average cost of IT downtime is €5,600 per minute according to Gartner. For small businesses, a single unplanned outage can mean days of lost productivity.

In this guide, we'll build a complete Linux disaster recovery plan from scratch β€” one that's actually tested and ready when disaster strikes.

Linux disaster recovery plan β€” server room emergency response

The 3-2-1 Backup Rule (And Why It's Not Enough)

You've probably heard the 3-2-1 rule: 3 copies, 2 different media, 1 offsite. It's a good start, but modern disaster recovery goes much further:

  • 3-2-1-1-0 Rule β€” Add 1 immutable copy and 0 untested backups
  • RPO (Recovery Point Objective) β€” How much data can you afford to lose? 1 hour? 5 minutes?
  • RTO (Recovery Time Objective) β€” How fast must you be back online? 4 hours? 15 minutes?
Real-world lesson: A backup that hasn't been tested is not a backup β€” it's a hope. Schedule monthly recovery drills.

Step 1: Inventory Your Critical Systems

Before building your DR plan, document everything that matters:

# Create a system inventory script
#!/bin/bash
echo "=== SYSTEM INVENTORY ==="
echo "Hostname: $(hostname)"
echo "OS: $(cat /etc/os-release | grep PRETTY_NAME)"
echo "Kernel: $(uname -r)"
echo "Disk Layout:"
lsblk -f
echo "Important Services:"
systemctl list-units --type=service --state=running | grep -E "nginx|apache|mysql|postgres|docker"
echo "Crontab:"
crontab -l 2>/dev/null
echo "Network:"
ip addr show | grep -E "inet " | grep -v 127.0.0.1

Step 2: Choose Your Backup Strategy

Strategy RPO RTO Best For
rsync + cron 1–24 hours 1–4 hours Small servers, file-based
Borg Backup 15 min–1 hour 30 min–2 hours Dedup, encrypted backups
ZFS snapshots 5–15 minutes Minutes High-value data, instant rollback
Database replication Near-zero Minutes PostgreSQL/MySQL HA

πŸ“˜ Deep Dive: Linux Backup Strategies

For a comprehensive guide covering rsync, Borg, ZFS snapshots, and cloud backup automation, check out Linux Backup Strategies (€19.90). Covers 15+ backup scenarios with production-ready scripts.

Step 3: Automate Everything

Manual backups fail because humans forget. Automate your entire backup pipeline:

# /etc/cron.d/disaster-recovery
# Database backup every 6 hours
0 */6 * * * postgres pg_dump -Fc mydb > /backup/db/mydb_$(date +\%Y\%m\%d_\%H).dump

# File backup with Borg every hour
0 * * * * root borg create --compression zstd /backup/borg::hourly-{now} /var/www /etc

# Verify backup integrity daily
0 3 * * * root borg check /backup/borg && echo "OK" | mail -s "Backup OK" admin@example.com

# Offsite sync every 4 hours
0 */4 * * * root rsync -avz --delete /backup/ offsite:/backup/server1/

πŸ”§ Hands-On: rsync & Borg Automation

Learn production-ready backup automation scripts with Linux Backup Automation with rsync & Borg (€14.90). Includes remote backup, encryption, retention policies, and monitoring.

Step 4: Test Your Recovery

The most critical step that 90% of teams skip. Create a recovery runbook and test it quarterly:

  1. Spin up a test VM β€” Use a clean server to simulate recovery
  2. Restore from backup β€” Follow your documented steps exactly
  3. Verify data integrity β€” Check database records, file checksums, application functionality
  4. Measure RTO β€” Time how long recovery actually takes
  5. Document gaps β€” Fix anything that failed or took too long

Step 5: Database-Specific Recovery

Databases need special attention. For PostgreSQL:

# Point-in-time recovery with WAL archiving
# postgresql.conf
archive_mode = on
archive_command = 'cp %p /backup/wal/%f'
wal_level = replica

# Restore to specific timestamp
pg_restore -d mydb --target-time="2026-02-28 14:30:00" /backup/base/latest

πŸ—„οΈ PostgreSQL HA & Recovery

Master point-in-time recovery, streaming replication, and automatic failover with PostgreSQL Backup, Replication & High Availability (€12.90).

Disaster Recovery Checklist

  • ☐ All critical systems documented and inventoried
  • ☐ RPO and RTO defined for each system
  • ☐ Automated backups running on schedule
  • ☐ Offsite/immutable backup copy exists
  • ☐ Database WAL archiving enabled
  • ☐ Recovery runbook documented step-by-step
  • ☐ Monthly/quarterly recovery drills scheduled
  • ☐ Monitoring alerts for backup failures
  • ☐ Team trained on recovery procedures

Frequently Asked Questions

How often should I test my backups?

At minimum quarterly for full disaster recovery tests. Monthly for individual service recovery (database, files). Weekly verification of backup integrity (checksums, file counts).

What's the cheapest offsite backup solution?

For small setups: rsync to a second VPS (€3–5/month). For larger setups: Borg + Backblaze B2 (€0.005/GB/month). Both are covered in our Linux Backup Automation guide.

Should I use LVM snapshots or ZFS snapshots?

ZFS snapshots are superior β€” they're instant, space-efficient, and support incremental send/receive. LVM snapshots degrade performance and aren't designed for long-term retention. Learn more in LVM & ZFS: Linux Storage Management.

Share this article:

Stay Updated

Subscribe to our newsletter for the latest tutorials, tips, and exclusive offers.