Most production Linux services run with far more privilege than they need. Nginx can read every file the user it runs as can read. PostgreSQL can open arbitrary network sockets. A node.js worker process can write anywhere on the filesystem its user has access to. None of this is necessary, and none of it is hard to fix - systemd has had per-unit security directives for the better part of a decade. The reason almost nobody uses them is documentation overload (there are 60+ directives, scattered across man systemd.exec, systemd.resource-control and systemd.unit) and the very real fear of breaking the service. This guide cuts through the catalogue and shows the directive set that actually matters, with concrete unit-file examples and a verification workflow that catches breakage before users do. It ships with a free PDF cheat sheet of every directive worth caring about.
Table of Contents
Why systemd hardening is the highest-leverage Linux work
SELinux and AppArmor are powerful, but both require a real authoring investment, both add operational mystery during outages, and both are easy to disable when something breaks. systemd security directives are different: they live next to the unit file, they are declarative one-liners, and a misconfiguration produces a clear failure in journalctl instead of a silent denial. They cover most of what an attacker would want to do after compromising a service - escalate privileges, write to the filesystem outside the working directory, open unrelated network sockets, exec arbitrary binaries - and they cost almost nothing to deploy.
The leverage is high because most fleets have hundreds of systemd units, and a small set of directives applied across all of them produces a meaningful reduction in blast radius. A compromised webserver that cannot write to /etc, cannot read /home, cannot open a raw socket and cannot call execve on anything outside /usr/bin/nginx is a much less useful foothold than one running with the historical Linux trust model.
Audit what you have today
Before changing anything, get a baseline. systemd ships an analyser that scores every running unit on a 10-point scale. The results are sobering on a stock distribution:
systemd-analyze security # all units, sorted by exposure
systemd-analyze security nginx.service # detailed report for one unit
systemd-analyze security --no-pager | head -30
Expect most third-party packages to land between 9.0 (very exposed) and 6.5 (mediocre). Anything below 5.0 is in good shape. The "exposure level" output shows exactly which directive would close which gap, with a colour-coded suggestion - it is the single most useful Linux security tool that almost nobody runs.
The core seven directives
Out of the 60+ available, seven directives produce 80% of the security gain and almost never break a working service. Add these to every unit you maintain:
[Service]
NoNewPrivileges=true
ProtectSystem=strict
ProtectHome=true
PrivateTmp=true
PrivateDevices=true
ProtectKernelTunables=true
ProtectKernelModules=true
ProtectControlGroups=true
NoNewPrivileges stops setuid/setgid escalation - a child process can never gain more privilege than the parent. ProtectSystem=strict mounts /usr, /boot and /etc read-only for the unit; the service can still write to its own working directory and to /var via StateDirectory=. ProtectHome=true blocks all reads from /home and /root - very few daemons have a legitimate reason to look there. The Private* directives sandbox /tmp and /dev per unit. The ProtectKernel* trio prevents the unit from poking at /proc/sys, /sys or cgroup config - all of which are common privilege-escalation paths.
Filesystem isolation
For services that need to write to specific paths, use StateDirectory=, CacheDirectory=, LogsDirectory= and RuntimeDirectory=. Each creates a uniquely-owned directory under /var with the right permissions, and the unit can write only there. Combine with ReadWritePaths= for legitimate exceptions and ReadOnlyPaths= to lock down everything else:
[Service]
StateDirectory=myapp # /var/lib/myapp, mode 0750
CacheDirectory=myapp # /var/cache/myapp
LogsDirectory=myapp # /var/log/myapp
RuntimeDirectory=myapp # /run/myapp, cleared on stop
ReadWritePaths=/srv/myapp/uploads # explicit exception
InaccessiblePaths=/home /root /boot # belt and braces
The InaccessiblePaths directive is more aggressive than ProtectHome - it makes the path appear empty to the unit. Useful for paths that ProtectSystem alone would not cover (custom mount points, NFS shares, container volumes).
Network restrictions
Most daemons need to listen on a small set of TCP/UDP ports and connect outbound to a small set of destinations. RestrictAddressFamilies says which protocol families are allowed; IPAddressDeny and IPAddressAllow filter at the BPF level by IP and CIDR:
[Service]
RestrictAddressFamilies=AF_UNIX AF_INET AF_INET6
IPAddressDeny=any
IPAddressAllow=10.0.0.0/8 192.0.2.5 # internal subnet + a known peer
PrivateNetwork=false # set true if no network at all needed
For a daemon that has no business on the network at all - a log rotator, a media converter, a backup script - PrivateNetwork=true drops it into a network namespace with only loopback. That single line eliminates an enormous attack surface and never breaks a service that does not need the network.
System call filtering
SystemCallFilter applies a seccomp filter, allowing only the syscalls the daemon legitimately uses. systemd ships pre-built syscall sets that cover the common cases - @system-service is the right starting point for almost any daemon:
[Service]
SystemCallFilter=@system-service
SystemCallErrorNumber=EPERM
SystemCallArchitectures=native
The @system-service set includes file IO, networking, signals and process management while excluding the dangerous ones (kexec_load, add_key, perf_event_open, ptrace, raw socket bind, etc.). Override and extend with named groups (@network-io, @process) or specific syscalls. Setting SystemCallErrorNumber=EPERM makes denied calls return a permission error instead of killing the process - friendlier debugging during rollout.
SystemCallArchitectures=native blocks the legacy 32-bit syscall ABI on 64-bit hosts. A surprising number of historical kernel exploits relied on the 32-bit ABI; this directive closes that window with no service-side cost.
Capabilities
Linux capabilities split root's powers into ~40 distinct privileges. Most daemons need a tiny subset; many need none at all. Drop everything and add back only what is required:
[Service]
CapabilityBoundingSet= # drop all, no caps allowed
AmbientCapabilities=
# For a daemon that needs to bind to port 80/443:
# CapabilityBoundingSet=CAP_NET_BIND_SERVICE
# AmbientCapabilities=CAP_NET_BIND_SERVICE
If the daemon currently runs as root only to bind a privileged port, this is the right place to fix it - drop to a regular user with User= and add CAP_NET_BIND_SERVICE ambient. A web server bound to port 443 with no other root privileges is a much smaller blast radius than one running as root for the whole lifetime.
Verifying the lockdown
The verification loop is short and powerful. After each change, restart the unit, check that it works, then re-run the analyser:
systemctl daemon-reload
systemctl restart nginx
systemctl status nginx # service still healthy?
systemd-analyze security nginx.service # exposure score moved?
journalctl -u nginx -n 50 --no-pager # any sandbox denials?
If the journal shows a sandbox denial, the message is unusually informative - it names the directive that triggered the deny, the syscall or path involved, and the resulting errno. Adjust the directive (loosen by one notch, not many) and retest. Most services lock down to a 3.0-4.5 exposure score with the core seven directives plus a sensible filesystem and capability set.
Production rollout pattern
The discipline that works on a real fleet has four steps. One, baseline every unit with systemd-analyze security --no-pager > baseline.txt in your config-management repo. Two, work in a drop-in directory rather than editing the vendor unit: /etc/systemd/system/nginx.service.d/hardening.conf survives package upgrades. Three, roll out per OU/role - one unit on one host first, then the host class, then the fleet. Four, alert on sandbox denials in the SIEM and treat them as ops bugs to be fixed by adjusting the directive (or, very rarely, by widening the exception with a justification in the commit message).
For Ansible-managed fleets the simplest pattern is a role that drops the hardening conf and reloads systemd; for k8s-adjacent infrastructure the same conf can be templated per service via Jinja. The point is the same: the hardening lives in source control next to the rest of the configuration, not as a manual edit to a single host.
Common pitfalls
- Editing the vendor unit directly. Use a drop-in (
/etc/systemd/system/<unit>.d/hardening.conf) so package upgrades do not silently undo the lockdown. - Setting ProtectSystem=strict and then writing logs to /var/log/myapp.
ProtectSystem=strictmakes/varread-only too unless you opt in viaLogsDirectory=. Use the directive, notReadWritePaths=/var/log/myapp. - Picking SystemCallFilter without @system-service. Hand-rolling the syscall list is a ticket to mysterious failures. Start with the named set, add specifics only as needed.
- Forgetting CAP_NET_BIND_SERVICE for daemons on privileged ports. The lockdown will pass restart, then fail under traffic when a worker spawns and tries to bind. Check the binding requirement before dropping all caps.
- Not running systemd-analyze security after every change. The score is the only objective measure of the lockdown's effect. Use it.
Audit checklist
- Baseline exposure score recorded in config repo (1 pt)
- Core seven directives applied to all custom units (1 pt)
SystemCallFilter=@system-serviceon every long-running daemon (1 pt)- Capabilities dropped to the minimum required set (1 pt)
- Sandbox denials forwarded to the SIEM and treated as bugs (1 pt)
5/5 = PASS, 3-4 = WARN, <3 = FAIL.
FAQ
Will this break SELinux or AppArmor?
No. The systemd directives complement MAC policies; they do not collide. Both layers can deny an action, and the most restrictive wins.
Does the lockdown survive a package upgrade?
Yes if you used a drop-in under /etc/systemd/system/<unit>.d/. No if you edited the vendor unit in /lib/systemd/system/.
What about containers?
Containers have their own sandboxing layer; systemd hardening is for host-level services. If you run systemd inside a container (rare), the directives still work but you need ProtectKernelTunables=false in some setups.
How do I know what the daemon actually needs?
Strace the working process, watch journalctl for sandbox denials during a test workload, and when in doubt loosen one directive at a time. The analyser's per-directive suggestions are a good starting point.
Can I roll back fast if something breaks?
rm /etc/systemd/system/<unit>.d/hardening.conf && systemctl daemon-reload && systemctl restart <unit>. Three commands, no package surgery.