Swap is misunderstood by half the systems engineers who configure it. "Disable swap entirely" and "make swap as large as RAM" are both folklore. The truth: a small, fast, properly tuned swap area extends OOM-kill latency, lets the kernel evict cold anonymous pages, and keeps applications responsive under memory pressure. This guide explains how Linux actually uses swap, the swappiness tunable, and modern alternatives like zram and zswap.
What swap actually does
Anonymous pages (heap, stack, malloc'd memory not backed by a file) need a destination when the kernel reclaims them. With swap: pages move to disk and the system continues. Without swap: the only path to free memory is killing a process. Swap does not exist to "extend RAM" β it exists to give the kernel an alternative to OOM.
free -h # used vs swap usage
swapon --show # active swap devices
cat /proc/swaps # raw view
vmstat 5 # si/so columns: pages swapped in/out per second
Provisioning swap
Swap can be a partition (legacy) or a file on an existing filesystem (modern, easier to resize). To create a 4 GB swap file:
sudo fallocate -l 4G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
For btrfs, use a properly preallocated file: btrfs filesystem mkswapfile on btrfs-progs 6.1+ or follow the documented chattr+truncate dance. Never put swap on a snapshotted subvolume.
How big should swap be?
Modern guidance, ignoring hibernate:
- β€ 2 GB RAM: 2Γ RAM (this is uncommon today).
- 2β8 GB RAM: 1Γ RAM.
- 8β64 GB RAM: 4 GB swap, max.
- > 64 GB RAM: 4 GB swap, only if you actually want OOM extension.
If you need hibernate-to-disk, swap must be β₯ RAM. On Kubernetes nodes, swap is now supported (since 1.28) but still discouraged for performance-critical workloads β pin pods with QoSClass=Guaranteed.
swappiness explained correctly
vm.swappiness (0β200 since kernel 5.8) is the tendency to swap anonymous pages versus dropping page-cache. The default of 60 is fine for desktops with mixed workloads. For database servers where the page cache is gold:
echo 'vm.swappiness=10' | sudo tee /etc/sysctl.d/99-swap.conf
sudo sysctl --system
A value of 1 (not 0) means "swap only when absolutely necessary." Setting 0 historically meant "never swap" and triggered OOM kills earlier than expected; modern kernels behave more sanely, but stick to 1 or 10 for clarity.
vfs_cache_pressure pairs with swappiness
Lower vm.vfs_cache_pressure retains directory and inode caches longer:
vm.swappiness = 10
vm.vfs_cache_pressure = 50
vm.dirty_background_ratio = 5
vm.dirty_ratio = 10
Together these three knobs reduce I/O latency variance under memory pressure on database, web, and reverse-proxy servers.
zram and zswap: compressed memory
If your bottleneck is disk I/O, compress instead of swap-to-disk. zram creates a compressed RAM-backed swap device β fast, but only useful if you have spare CPU and your data compresses well. zswap is a write-back cache between anonymous pages and your real swap device.
sudo modprobe zram
echo lz4 | sudo tee /sys/block/zram0/comp_algorithm
echo 4G | sudo tee /sys/block/zram0/disksize
sudo mkswap /dev/zram0
sudo swapon /dev/zram0 -p 100 # higher priority than disk swap
Most desktop distributions now ship with zram-generator enabled by default. On a server with 8 GB RAM and 1 GB compressed zram, you typically gain 2β3 GB of effective memory before disk swap is needed.
Monitoring swap pressure
Total swap usage is a misleading metric β long-cold pages parked in swap are healthy. The real signal is swap activity:
vmstat 5 6 # si/so columns
sar -W 5 6 # pswpin/s, pswpout/s
grep VmSwap /proc/$(pgrep -f myapp)/status
cat /proc/pressure/memory # PSI: full vs some, useful in containers
Alert when sustained so > 1000 pages/sec for several minutes β that means active thrashing, not benign cold-page eviction.
Container considerations
cgroup v2 exposes memory.swap.max per container; default is 0 (no swap allowed). Memory pressure inside the container is reported via memory.pressure in the cgroup directory. Honor PSI before scaling β many "OOM" incidents are actually long PSI stalls.
When to disable swap
Two valid cases: a node where the kubelet pre-1.28 refuses to start, and an extreme low-latency host where swap-induced jitter is unacceptable. In every other case, a small swap with swappiness=10 is a free reliability win.
Quick checklist
- Swap exists, is on its own partition or file with 600 perms.
swappiness=10for servers, default for desktops.- Monitor
si/so, not raw swap usage. - zram enabled on memory-constrained hosts.
- fstab line present so swap survives reboot.
Treat swap like an insurance policy: cheap, mostly invisible, and the difference between an annoying page and a 03:00 incident the day a customer happens to send a 4 GB file.