Your server just crashed. The database is corrupted. The RAID array is degraded. What do you do?
If you don't have a tested disaster recovery (DR) plan, the answer is: panic. And panic costs money β the average cost of IT downtime is β¬5,600 per minute according to Gartner. For small businesses, a single unplanned outage can mean days of lost productivity.
In this guide, we'll build a complete Linux disaster recovery plan from scratch β one that's actually tested and ready when disaster strikes.
The 3-2-1 Backup Rule (And Why It's Not Enough)
You've probably heard the 3-2-1 rule: 3 copies, 2 different media, 1 offsite. It's a good start, but modern disaster recovery goes much further:
- 3-2-1-1-0 Rule β Add 1 immutable copy and 0 untested backups
- RPO (Recovery Point Objective) β How much data can you afford to lose? 1 hour? 5 minutes?
- RTO (Recovery Time Objective) β How fast must you be back online? 4 hours? 15 minutes?
Step 1: Inventory Your Critical Systems
Before building your DR plan, document everything that matters:
# Create a system inventory script
#!/bin/bash
echo "=== SYSTEM INVENTORY ==="
echo "Hostname: $(hostname)"
echo "OS: $(cat /etc/os-release | grep PRETTY_NAME)"
echo "Kernel: $(uname -r)"
echo "Disk Layout:"
lsblk -f
echo "Important Services:"
systemctl list-units --type=service --state=running | grep -E "nginx|apache|mysql|postgres|docker"
echo "Crontab:"
crontab -l 2>/dev/null
echo "Network:"
ip addr show | grep -E "inet " | grep -v 127.0.0.1
Step 2: Choose Your Backup Strategy
| Strategy | RPO | RTO | Best For |
|---|---|---|---|
| rsync + cron | 1β24 hours | 1β4 hours | Small servers, file-based |
| Borg Backup | 15 minβ1 hour | 30 minβ2 hours | Dedup, encrypted backups |
| ZFS snapshots | 5β15 minutes | Minutes | High-value data, instant rollback |
| Database replication | Near-zero | Minutes | PostgreSQL/MySQL HA |
π Deep Dive: Linux Backup Strategies
For a comprehensive guide covering rsync, Borg, ZFS snapshots, and cloud backup automation, check out Linux Backup Strategies (β¬19.90). Covers 15+ backup scenarios with production-ready scripts.
Step 3: Automate Everything
Manual backups fail because humans forget. Automate your entire backup pipeline:
# /etc/cron.d/disaster-recovery
# Database backup every 6 hours
0 */6 * * * postgres pg_dump -Fc mydb > /backup/db/mydb_$(date +\%Y\%m\%d_\%H).dump
# File backup with Borg every hour
0 * * * * root borg create --compression zstd /backup/borg::hourly-{now} /var/www /etc
# Verify backup integrity daily
0 3 * * * root borg check /backup/borg && echo "OK" | mail -s "Backup OK" admin@example.com
# Offsite sync every 4 hours
0 */4 * * * root rsync -avz --delete /backup/ offsite:/backup/server1/
π§ Hands-On: rsync & Borg Automation
Learn production-ready backup automation scripts with Linux Backup Automation with rsync & Borg (β¬14.90). Includes remote backup, encryption, retention policies, and monitoring.
Step 4: Test Your Recovery
The most critical step that 90% of teams skip. Create a recovery runbook and test it quarterly:
- Spin up a test VM β Use a clean server to simulate recovery
- Restore from backup β Follow your documented steps exactly
- Verify data integrity β Check database records, file checksums, application functionality
- Measure RTO β Time how long recovery actually takes
- Document gaps β Fix anything that failed or took too long
Step 5: Database-Specific Recovery
Databases need special attention. For PostgreSQL:
# Point-in-time recovery with WAL archiving
# postgresql.conf
archive_mode = on
archive_command = 'cp %p /backup/wal/%f'
wal_level = replica
# Restore to specific timestamp
pg_restore -d mydb --target-time="2026-02-28 14:30:00" /backup/base/latest
ποΈ PostgreSQL HA & Recovery
Master point-in-time recovery, streaming replication, and automatic failover with PostgreSQL Backup, Replication & High Availability (β¬12.90).
Disaster Recovery Checklist
- β All critical systems documented and inventoried
- β RPO and RTO defined for each system
- β Automated backups running on schedule
- β Offsite/immutable backup copy exists
- β Database WAL archiving enabled
- β Recovery runbook documented step-by-step
- β Monthly/quarterly recovery drills scheduled
- β Monitoring alerts for backup failures
- β Team trained on recovery procedures
Frequently Asked Questions
How often should I test my backups?
At minimum quarterly for full disaster recovery tests. Monthly for individual service recovery (database, files). Weekly verification of backup integrity (checksums, file counts).
What's the cheapest offsite backup solution?
For small setups: rsync to a second VPS (β¬3β5/month). For larger setups: Borg + Backblaze B2 (β¬0.005/GB/month). Both are covered in our Linux Backup Automation guide.
Should I use LVM snapshots or ZFS snapshots?
ZFS snapshots are superior β they're instant, space-efficient, and support incremental send/receive. LVM snapshots degrade performance and aren't designed for long-term retention. Learn more in LVM & ZFS: Linux Storage Management.