Creating Incremental Backups with rsync
Table of Contents
- [Introduction to rsync](#introduction-to-rsync) - [Understanding Incremental Backups](#understanding-incremental-backups) - [Basic rsync Syntax](#basic-rsync-syntax) - [Command Options and Flags](#command-options-and-flags) - [Setting Up Incremental Backups](#setting-up-incremental-backups) - [Advanced Techniques](#advanced-techniques) - [Practical Examples](#practical-examples) - [Automation and Scheduling](#automation-and-scheduling) - [Monitoring and Verification](#monitoring-and-verification) - [Troubleshooting](#troubleshooting) - [Best Practices](#best-practices)Introduction to rsync
rsync (remote synchronization) is a powerful command-line utility for efficiently transferring and synchronizing files across computer systems. It is particularly well-suited for creating incremental backups due to its ability to transfer only the differences between source and destination files, significantly reducing bandwidth usage and backup time.
Key Features of rsync
| Feature | Description | |---------|-------------| | Delta Transfer | Only transfers changed portions of files | | Compression | Built-in compression to reduce transfer size | | Preservation | Maintains file permissions, timestamps, and ownership | | Remote Sync | Works over SSH for secure remote backups | | Exclusion Patterns | Flexible file and directory exclusion rules | | Dry Run Mode | Test operations without making changes |
Why Use rsync for Incremental Backups
Incremental backups with rsync offer several advantages over traditional full backup methods:
- Efficiency: Only modified files are transferred - Speed: Faster backup operations after initial sync - Bandwidth Conservation: Minimal network usage for remote backups - Storage Optimization: Reduced storage requirements - Flexibility: Customizable backup strategies
Understanding Incremental Backups
Incremental backups are a backup strategy where only files that have changed since the last backup are copied to the backup destination. This approach contrasts with full backups, which copy all files regardless of whether they have changed.
Backup Types Comparison
| Backup Type | Description | Pros | Cons | |-------------|-------------|------|------| | Full Backup | Complete copy of all data | Simple restore process | Time-consuming, storage-intensive | | Incremental | Only changed files since last backup | Fast, efficient storage | Complex restore chain | | Differential | Changed files since last full backup | Simpler restore than incremental | Grows larger over time |
How rsync Implements Incremental Backups
rsync achieves incremental functionality through several mechanisms:
1. File Modification Time: Compares timestamps between source and destination 2. File Size Comparison: Checks if file sizes differ 3. Checksum Verification: Optional deep comparison using checksums 4. Hard Link Creation: Creates space-efficient snapshots using hard links
Basic rsync Syntax
The fundamental syntax for rsync follows this pattern:
`bash
rsync [OPTIONS] SOURCE DESTINATION
`
Basic Command Structure
`bash
Local to local backup
rsync -av /source/directory/ /backup/destination/Local to remote backup
rsync -av /source/directory/ user@remote-host:/backup/destination/Remote to local backup
rsync -av user@remote-host:/source/directory/ /local/backup/`Important Syntax Notes
- Trailing Slash Significance: The presence or absence of a trailing slash on the source directory affects behavior
- rsync -av /source/dir/ /dest/ copies contents of dir into dest
- rsync -av /source/dir /dest/ copies dir itself into dest
- Path Specifications: Always use absolute paths for consistency
- Quoting: Use quotes around paths containing spaces or special characters
Command Options and Flags
Understanding rsync options is crucial for creating effective incremental backup strategies.
Essential Options
| Option | Long Form | Description | |--------|-----------|-------------| | -a | --archive | Archive mode (preserves permissions, times, etc.) | | -v | --verbose | Verbose output | | -r | --recursive | Recurse into directories | | -u | --update | Skip files newer on destination | | -n | --dry-run | Show what would be transferred without doing it | | -z | --compress | Compress file data during transfer | | -h | --human-readable | Output numbers in human-readable format |
Advanced Options for Incremental Backups
| Option | Long Form | Description | |--------|-----------|-------------| | --delete | | Delete extraneous files from destination | | --delete-excluded | | Delete excluded files from destination | | --backup | | Make backups of existing files | | --backup-dir=DIR | | Store backups in specified directory | | --suffix=SUFFIX | | Backup suffix (default ~) | | --link-dest=DIR | | Hardlink to files in DIR when unchanged | | --exclude=PATTERN | | Exclude files matching pattern | | --exclude-from=FILE | | Read exclude patterns from file | | --include=PATTERN | | Include files matching pattern | | --files-from=FILE | | Read file list from file |
Progress and Logging Options
| Option | Long Form | Description | |--------|-----------|-------------| | --progress | | Show progress during transfer | | --stats | | Give file transfer stats | | --log-file=FILE | | Log what rsync is doing to file | | --itemize-changes | | Output change summary for all updates |
Setting Up Incremental Backups
Method 1: Simple Incremental Backup
The most straightforward approach uses rsync's built-in incremental capabilities:
`bash
#!/bin/bash
Simple incremental backup script
SOURCE="/home/user/documents" DESTINATION="/backup/documents" LOGFILE="/var/log/backup.log"
rsync -avh --delete --stats --log-file="$LOGFILE" "$SOURCE/" "$DESTINATION/"
`
Method 2: Snapshot-Style Backups with Hard Links
This method creates multiple backup snapshots while using hard links to save space:
`bash
#!/bin/bash
Snapshot-style incremental backup
SOURCE="/home/user/data" BACKUP_ROOT="/backup/snapshots" DATE=$(date +%Y-%m-%d_%H-%M-%S) CURRENT_BACKUP="$BACKUP_ROOT/backup-$DATE" LATEST_LINK="$BACKUP_ROOT/latest"
Create backup directory
mkdir -p "$CURRENT_BACKUP"Perform backup with hard links to previous backup
if [ -d "$LATEST_LINK" ]; then rsync -av --delete --link-dest="$LATEST_LINK" "$SOURCE/" "$CURRENT_BACKUP/" else rsync -av --delete "$SOURCE/" "$CURRENT_BACKUP/" fiUpdate latest link
rm -f "$LATEST_LINK" ln -s "$CURRENT_BACKUP" "$LATEST_LINK"`Method 3: Incremental with Exclusions
Create backups while excluding unnecessary files:
`bash
#!/bin/bash
Incremental backup with exclusions
SOURCE="/home/user" DESTINATION="/backup/user-data" EXCLUDE_FILE="/etc/backup-exclude.txt"
Create exclude file if it doesn't exist
cat > "$EXCLUDE_FILE" << EOF *.tmp *.cache .git/ node_modules/ __pycache__/ *.log Trash/ .thumbnails/ EOFrsync -avh --delete --exclude-from="$EXCLUDE_FILE" "$SOURCE/" "$DESTINATION/"
`
Advanced Techniques
Using rsync with SSH for Remote Backups
For secure remote backups, rsync can tunnel through SSH:
`bash
Remote backup over SSH
rsync -avz -e ssh /local/data/ user@backup-server:/remote/backup/Using specific SSH key
rsync -avz -e "ssh -i /path/to/private/key" /local/data/ user@backup-server:/remote/backup/Custom SSH port
rsync -avz -e "ssh -p 2222" /local/data/ user@backup-server:/remote/backup/`Bandwidth Limiting
Control bandwidth usage during backups:
`bash
Limit bandwidth to 1000 KB/s
rsync -av --bwlimit=1000 /source/ /destination/Limit to 50% of available bandwidth (requires additional tools)
rsync -av --bwlimit=$(( $(cat /proc/net/dev | grep eth0 | awk '{print $2}') / 1024 / 2 )) /source/ /destination/`Multi-Destination Backups
Create backups to multiple destinations:
`bash
#!/bin/bash
Multi-destination backup script
SOURCE="/important/data" DESTINATIONS=( "/local/backup" "user@server1:/remote/backup" "user@server2:/offsite/backup" )
for dest in "${DESTINATIONS[@]}"; do
echo "Backing up to $dest"
rsync -avz --delete "$SOURCE/" "$dest/" || echo "Backup to $dest failed"
done
`
Practical Examples
Example 1: Daily Incremental Home Directory Backup
`bash
#!/bin/bash
Daily home directory backup script
File: /usr/local/bin/daily-backup.sh
Configuration
USER="john" SOURCE="/home/$USER" BACKUP_ROOT="/backup/daily" DATE=$(date +%Y-%m-%d) BACKUP_DIR="$BACKUP_ROOT/$DATE" LATEST_LINK="$BACKUP_ROOT/latest" LOG_FILE="/var/log/daily-backup.log"Create backup directory
mkdir -p "$BACKUP_DIR"Exclusion patterns
EXCLUDE_PATTERNS=( "*.tmp" "*.cache" ".cache/" ".local/share/Trash/" "Downloads/temp/" ".mozilla/firefox/*/Cache/" ".thunderbird/*/ImapMail/" )Build exclude arguments
EXCLUDE_ARGS="" for pattern in "${EXCLUDE_PATTERNS[@]}"; do EXCLUDE_ARGS="$EXCLUDE_ARGS --exclude=$pattern" doneLog backup start
echo "$(date): Starting backup of $SOURCE" >> "$LOG_FILE"Perform backup
if [ -d "$LATEST_LINK" ]; then rsync -av --delete --link-dest="$LATEST_LINK" $EXCLUDE_ARGS "$SOURCE/" "$BACKUP_DIR/" >> "$LOG_FILE" 2>&1 else rsync -av --delete $EXCLUDE_ARGS "$SOURCE/" "$BACKUP_DIR/" >> "$LOG_FILE" 2>&1 fiCheck if backup was successful
if [ $? -eq 0 ]; then # Update latest link rm -f "$LATEST_LINK" ln -s "$BACKUP_DIR" "$LATEST_LINK" echo "$(date): Backup completed successfully" >> "$LOG_FILE" else echo "$(date): Backup failed with exit code $?" >> "$LOG_FILE" exit 1 fiClean up old backups (keep 30 days)
find "$BACKUP_ROOT" -maxdepth 1 -type d -name "20*" -mtime +30 -exec rm -rf {} \;`Example 2: Database Backup with rsync
`bash
#!/bin/bash
Database backup with rsync
File: /usr/local/bin/db-backup.sh
DB_NAME="production_db" DB_USER="backup_user" DUMP_DIR="/tmp/db_dumps" BACKUP_DIR="/backup/database" REMOTE_BACKUP="backup-server:/backup/db"
Create dump directory
mkdir -p "$DUMP_DIR"Create database dump
mysqldump -u "$DB_USER" -p "$DB_NAME" > "$DUMP_DIR/${DB_NAME}_$(date +%Y%m%d_%H%M%S).sql"Compress old dumps
find "$DUMP_DIR" -name "*.sql" -mtime +1 -exec gzip {} \;Sync to local backup
rsync -av --delete "$DUMP_DIR/" "$BACKUP_DIR/"Sync to remote backup
rsync -avz --delete "$DUMP_DIR/" "$REMOTE_BACKUP/"Clean up old dumps (keep 7 days)
find "$DUMP_DIR" -name "*.sql.gz" -mtime +7 -delete`Example 3: Website Backup Script
`bash
#!/bin/bash
Website incremental backup script
File: /usr/local/bin/website-backup.sh
WEBSITE_ROOT="/var/www/html" BACKUP_ROOT="/backup/website" REMOTE_SERVER="backup.example.com" REMOTE_USER="backup" REMOTE_PATH="/backup/websites/$(hostname)"
Local backup first
DATE=$(date +%Y-%m-%d_%H-%M-%S) LOCAL_BACKUP="$BACKUP_ROOT/$DATE" LATEST_LOCAL="$BACKUP_ROOT/latest"mkdir -p "$LOCAL_BACKUP"
Perform local incremental backup
if [ -d "$LATEST_LOCAL" ]; then rsync -av --delete --link-dest="$LATEST_LOCAL" \ --exclude="*.log" \ --exclude="cache/" \ --exclude="tmp/" \ "$WEBSITE_ROOT/" "$LOCAL_BACKUP/" else rsync -av --delete \ --exclude="*.log" \ --exclude="cache/" \ --exclude="tmp/" \ "$WEBSITE_ROOT/" "$LOCAL_BACKUP/" fiUpdate latest link
rm -f "$LATEST_LOCAL" ln -s "$LOCAL_BACKUP" "$LATEST_LOCAL"Sync to remote server
rsync -avz --delete -e ssh \ "$LOCAL_BACKUP/" \ "$REMOTE_USER@$REMOTE_SERVER:$REMOTE_PATH/"Cleanup old local backups (keep 14 days)
find "$BACKUP_ROOT" -maxdepth 1 -type d -name "20*" -mtime +14 -exec rm -rf {} \;`Automation and Scheduling
Using Cron for Scheduled Backups
Create automated backup schedules using cron:
`bash
Edit crontab
crontab -eAdd backup schedules
Daily backup at 2 AM
0 2 * /usr/local/bin/daily-backup.shWeekly backup every Sunday at 3 AM
0 3 0 /usr/local/bin/weekly-backup.shMonthly backup on first day of month at 4 AM
0 4 1 /usr/local/bin/monthly-backup.sh`Systemd Timer Alternative
Create a systemd service and timer:
`ini
/etc/systemd/system/incremental-backup.service
[Unit] Description=Incremental Backup Service After=network.target[Service]
Type=oneshot
ExecStart=/usr/local/bin/incremental-backup.sh
User=backup
Group=backup
`
`ini
/etc/systemd/system/incremental-backup.timer
[Unit] Description=Run incremental backup daily Requires=incremental-backup.service[Timer] OnCalendar=daily Persistent=true
[Install]
WantedBy=timers.target
`
Enable and start the timer:
`bash
systemctl daemon-reload
systemctl enable incremental-backup.timer
systemctl start incremental-backup.timer
`
Monitoring and Verification
Backup Verification Script
`bash
#!/bin/bash
Backup verification script
File: /usr/local/bin/verify-backup.sh
SOURCE="/home/user/documents" BACKUP="/backup/documents" LOG_FILE="/var/log/backup-verification.log"
echo "$(date): Starting backup verification" >> "$LOG_FILE"
Check if backup directory exists
if [ ! -d "$BACKUP" ]; then echo "$(date): ERROR - Backup directory does not exist" >> "$LOG_FILE" exit 1 fiCompare file counts
SOURCE_COUNT=$(find "$SOURCE" -type f | wc -l) BACKUP_COUNT=$(find "$BACKUP" -type f | wc -l)echo "$(date): Source files: $SOURCE_COUNT, Backup files: $BACKUP_COUNT" >> "$LOG_FILE"
Verify checksums of critical files
CRITICAL_FILES=( "important_document.pdf" "database_dump.sql" "configuration.conf" )for file in "${CRITICAL_FILES[@]}"; do if [ -f "$SOURCE/$file" ] && [ -f "$BACKUP/$file" ]; then SOURCE_MD5=$(md5sum "$SOURCE/$file" | cut -d' ' -f1) BACKUP_MD5=$(md5sum "$BACKUP/$file" | cut -d' ' -f1) if [ "$SOURCE_MD5" = "$BACKUP_MD5" ]; then echo "$(date): VERIFIED - $file" >> "$LOG_FILE" else echo "$(date): ERROR - $file checksum mismatch" >> "$LOG_FILE" fi fi done
echo "$(date): Backup verification completed" >> "$LOG_FILE"
`
Monitoring Backup Size and Growth
`bash
#!/bin/bash
Monitor backup size and growth
File: /usr/local/bin/monitor-backup-size.sh
BACKUP_DIR="/backup" LOG_FILE="/var/log/backup-size.log"
Get current backup size
CURRENT_SIZE=$(du -sh "$BACKUP_DIR" | cut -f1) CURRENT_SIZE_BYTES=$(du -sb "$BACKUP_DIR" | cut -f1)Log current size
echo "$(date),$CURRENT_SIZE,$CURRENT_SIZE_BYTES" >> "$LOG_FILE"Check growth rate (compare with yesterday)
YESTERDAY=$(date -d "1 day ago" +%Y-%m-%d) YESTERDAY_SIZE=$(grep "$YESTERDAY" "$LOG_FILE" | tail -1 | cut -d',' -f3)if [ -n "$YESTERDAY_SIZE" ]; then
GROWTH=$((CURRENT_SIZE_BYTES - YESTERDAY_SIZE))
GROWTH_HUMAN=$(echo "$GROWTH" | awk '{print $1/1024/1024/1024 " GB"}')
echo "$(date): Backup growth: $GROWTH_HUMAN" >> "$LOG_FILE"
fi
`
Troubleshooting
Common rsync Issues and Solutions
| Issue | Symptoms | Solution | |-------|----------|----------| | Permission Denied | rsync: recv_generator: mkdir failed | Check destination permissions, use sudo if needed | | SSH Connection Failed | ssh: connect to host port 22: Connection refused | Verify SSH service, firewall, and credentials | | File Already Exists | file has vanished | Use --ignore-missing-args option | | Bandwidth Issues | Slow transfer speeds | Use --bwlimit to control bandwidth usage | | Partial Transfers | rsync error: some files could not be transferred | Check disk space and file permissions |
Debugging rsync Operations
Enable verbose output and debugging:
`bash
Maximum verbosity
rsync -avvv --debug=ALL /source/ /destination/Dry run with detailed output
rsync -avvn --itemize-changes /source/ /destination/Log all operations
rsync -av --log-file=/tmp/rsync.log /source/ /destination/`Recovery Procedures
`bash
#!/bin/bash
Backup recovery script
File: /usr/local/bin/recover-backup.sh
BACKUP_DIR="/backup/documents/latest" RECOVERY_DIR="/recovery/documents" LOG_FILE="/var/log/recovery.log"
echo "$(date): Starting recovery from $BACKUP_DIR to $RECOVERY_DIR" >> "$LOG_FILE"
Create recovery directory
mkdir -p "$RECOVERY_DIR"Restore files
rsync -av --progress "$BACKUP_DIR/" "$RECOVERY_DIR/" >> "$LOG_FILE" 2>&1if [ $? -eq 0 ]; then
echo "$(date): Recovery completed successfully" >> "$LOG_FILE"
else
echo "$(date): Recovery failed" >> "$LOG_FILE"
exit 1
fi
`
Best Practices
Security Considerations
1. SSH Key Authentication: Use SSH keys instead of passwords for remote backups 2. Backup Encryption: Encrypt backup destinations, especially for sensitive data 3. Access Control: Limit backup script permissions and user access 4. Network Security: Use VPN or secure networks for backup transfers
Performance Optimization
| Technique | Description | Command Example |
|-----------|-------------|-----------------|
| Compression | Enable compression for remote transfers | rsync -avz |
| Bandwidth Limiting | Control network usage | rsync -av --bwlimit=1000 |
| Parallel Processing | Run multiple rsync processes | Use xargs or GNU parallel |
| Exclude Unnecessary Files | Skip temporary and cache files | rsync -av --exclude="*.tmp" |
Maintenance Tasks
1. Regular Testing: Periodically test backup restoration procedures 2. Log Rotation: Implement log rotation to prevent log files from growing too large 3. Storage Monitoring: Monitor backup storage usage and plan for capacity 4. Documentation: Maintain documentation of backup procedures and schedules
Backup Strategy Recommendations
| Backup Type | Frequency | Retention | Purpose | |-------------|-----------|-----------|---------| | Incremental | Daily | 30 days | Regular data protection | | Weekly Full | Weekly | 12 weeks | Medium-term recovery | | Monthly Archive | Monthly | 12 months | Long-term retention | | Yearly Archive | Yearly | 7 years | Compliance and historical data |
Script Template for Production Use
`bash
#!/bin/bash
Production-ready incremental backup script
File: /usr/local/bin/production-backup.sh
set -euo pipefail # Exit on error, undefined variables, pipe failures
Configuration
readonly SCRIPT_NAME=$(basename "$0") readonly LOCK_FILE="/var/run/${SCRIPT_NAME}.lock" readonly LOG_FILE="/var/log/${SCRIPT_NAME}.log" readonly CONFIG_FILE="/etc/backup.conf"Source configuration
if [ -f "$CONFIG_FILE" ]; then source "$CONFIG_FILE" else echo "Configuration file $CONFIG_FILE not found" >&2 exit 1 fiFunctions
log_message() { echo "$(date '+%Y-%m-%d %H:%M:%S') [$SCRIPT_NAME] $1" >> "$LOG_FILE" }cleanup() { rm -f "$LOCK_FILE" log_message "Backup process finished" }
error_exit() { log_message "ERROR: $1" cleanup exit 1 }
Check for lock file
if [ -f "$LOCK_FILE" ]; then error_exit "Another backup process is already running" fiCreate lock file
echo $ > "$LOCK_FILE" trap cleanup EXITStart backup process
log_message "Starting backup process"Validate source directory
if [ ! -d "$SOURCE_DIR" ]; then error_exit "Source directory $SOURCE_DIR does not exist" fiCreate backup directory
mkdir -p "$BACKUP_DIR" || error_exit "Failed to create backup directory"Perform backup
log_message "Backing up $SOURCE_DIR to $BACKUP_DIR" rsync -av --delete --stats \ --exclude-from="$EXCLUDE_FILE" \ "$SOURCE_DIR/" "$BACKUP_DIR/" >> "$LOG_FILE" 2>&1 || error_exit "Backup failed"log_message "Backup completed successfully"
`
This comprehensive guide provides the foundation for implementing robust incremental backup solutions using rsync. The techniques and examples presented can be adapted to various environments and requirements, ensuring data protection through efficient and reliable backup strategies.