๐ŸŽ New User? Get 20% off your first purchase with code NEWUSER20 ยท โšก Instant download ยท ๐Ÿ”’ Secure checkout Register Now โ†’
Menu

Categories

SSH Certificate Authority: Short-Lived Certs and Zero-Trust SSH (2026)

SSH Certificate Authority: Short-Lived Certs and Zero-Trust SSH (2026)
SSH certificate authority and short-lived certs - Dargslan 2026

Look at authorized_keys on any production Linux fleet older than two years. You will see twenty keys, half belong to people who have left, several were uploaded for a one-day debugging session, and nobody can authoritatively say which is which. The host-key story is the same in reverse: every fresh server prompts the operator with "Are you sure you want to continue connecting?" and the operator types yes without checking. Both problems have the same fix - SSH certificates - and OpenSSH has supported them since 5.4 (2010). The reason almost nobody uses them is the perceived complexity of standing up the certificate authority. In 2026 that complexity is gone: HashiCorp Vault, smallstep step-ca, Teleport, and even a five-line bash script can act as an SSH CA. This guide walks the practical path from authorized_keys sprawl to short-lived, IdP-bound user certificates and signed host keys, and ships a free PDF cheat sheet of the exact commands.

Why authorized_keys must go

The authorized_keys model has three problems that no amount of discipline solves at scale. One, the keys are static - the same key file may be on a host for years, the corresponding private key may have been copied to a developer laptop that has been lost twice. Two, there is no central audit - to know who has access to which host, you have to enumerate every host's authorized_keys and reverse-map the keys to humans. Three, revocation is per-host, performed manually, and almost never complete - when an employee leaves, their public key tends to stay on at least one server for months. Most internal incident reports name "stale SSH key on a forgotten host" as a contributing factor; the only defence is to stop having stale SSH keys.

SSH certificates eliminate all three. A user certificate is a public key plus a signed assertion (issuer, principal, validity window, allowed hosts, source IPs, forced commands) issued by a CA the host already trusts. The host accepts the certificate without ever having seen the user's public key. The cert lives for an hour, a day or a week - the operator's choice - and then it expires. Revocation is a single-line update on the CA. The audit trail is one log file at the CA, not a fleet-wide grep.

What an SSH CA actually does

An SSH CA is a long-lived signing key plus an issuance policy. It signs two kinds of certificates:

  • User certificates sign a user's public key with the CA, asserting that the user is allowed to log in as one of the listed principals (e.g. alice, ubuntu, ec2-user) on hosts that trust this CA.
  • Host certificates sign a host's public key with the CA, asserting that the host's identity is its FQDN. Clients that trust the CA accept these without prompting.

The host's sshd_config needs only two new lines for user authentication (TrustedUserCAKeys + a path to the CA's public key) and a similar pair for presenting its host cert (HostCertificate + HostKey). The client side needs one line in ~/.ssh/known_hosts per CA-trusted domain. That is the whole protocol surface.

Choosing the CA backend

The choice of CA backend is operational, not cryptographic. Four options cover the field:

  • HashiCorp Vault SSH secrets engine - the most common pick for teams already running Vault. The user authenticates to Vault with their normal IdP token, requests a cert with a role that pins their principals and TTL, and gets back a short-lived signed cert. Excellent audit trail, easy revocation, scales to thousands of users. Operationally heavier if you do not already run Vault.
  • smallstep step-ca - a single Go binary, JWT/OIDC provisioner support, lighter to operate than Vault, ideal for teams of 5-200 users. Very good documentation, sensible defaults, deploys in an afternoon.
  • Teleport - a managed control plane that issues certs, also handles session recording, RBAC, audit. Heavier on cost and lock-in, lighter on operator effort. Pick if SSH is one of many access control problems you want centralised.
  • Self-hosted scripts (ssh-keygen + a wrapping API) - viable for very small fleets or for the initial pilot. Fine for <50 users; struggles past that on revocation and audit.

For most Dargslan-scale setups (10-200 hosts, 5-50 operators) step-ca or Vault are the right answers. The rest of this guide uses step-ca for examples; the concepts translate one-to-one to Vault.

Host certificates - end the prompt forever

Sign every host's public key with the CA. The host config gets two lines:

# On the CA admin host
step ssh certificate web01.dargslan.com /etc/ssh/ssh_host_ed25519_key.pub \
    --host --principal web01.dargslan.com --principal 10.0.1.42 \
    --provisioner ansible

# Copy the resulting *-cert.pub to /etc/ssh/ on the host, then in sshd_config:
HostKey         /etc/ssh/ssh_host_ed25519_key
HostCertificate /etc/ssh/ssh_host_ed25519_key-cert.pub

Clients trust the host CA via a single @cert-authority line in ~/.ssh/known_hosts (or system-wide):

@cert-authority *.dargslan.com,10.0.* ssh-ed25519 AAAA...the host CA pub key...

That one line replaces every per-host TOFU prompt across the fleet. Reprovisioned hosts no longer trigger the "REMOTE HOST IDENTIFICATION HAS CHANGED" warning, because the cert vouches for the new key. Operators stop training themselves to ignore that warning.

User certificates - short-lived and bound to identity

The user side is the high-value half. With step-ca and an OIDC provisioner pointing at your IdP (Entra ID, Okta, Google), the operator's flow looks like:

step ssh login alice@dargslan.com --provisioner okta
# Browser opens, MFA prompt, certificate written to ~/.ssh/
ssh web01                    # uses the freshly minted cert, no key file
ssh -i ~/.ssh/id_ed25519-cert.pub -v ec2-user@web01    # see the cert in -v

The cert has a short TTL (16 hours is a common choice - covers a working day, expires overnight), a list of principals (the Linux users the cert allows), an optional force-command, and an optional source-address CIDR. When the cert expires, the operator runs step ssh login again - which means a fresh MFA prompt. There is nothing for the operator to "lose" beyond a 16-hour-old cert that the CA has already burned.

Bind the principal to the IdP attribute, not to a hand-maintained list. The step-ca OIDC provisioner can map the IdP's preferred_username claim into the cert principal automatically; Vault can do the same with a templated role. The result is that group membership in the IdP - "alice is in the linux-admins group" - is the source of truth for SSH access. Removing alice from the group at offboarding revokes her future certs without touching a single host.

Production rollout pattern

The four-week rollout that minimises risk and maximises adoption:

  1. Week 1 - stand up the CA, issue host certs. Pick a backend, deploy it, sign every host. Distribute the host CA public key via @cert-authority in operators' known_hosts. authorized_keys still in use; nothing breaks.
  2. Week 2 - parallel user certs. Add TrustedUserCAKeys /etc/ssh/users-ca.pub to every host. Operators can now log in with either the old key or a fresh cert. Sponsor a couple of early adopters, smooth out their workflow.
  3. Week 3 - default to certs. Document step ssh login in the runbook, point new operators at it, leave key access in place for emergencies.
  4. Week 4 - retire keys. Audit each authorized_keys file, remove all but break-glass keys (kept under a different process, in a vault). Set AuthorizedKeysFile /dev/null on hosts that should be cert-only.

The pattern is staged, reversible, and gives operators time to internalise the new flow. Skip the parallel period and you will get a backlash.

Rotation and revocation

Rotation has two timescales. The user-cert TTL is short by design - 16 hours is typical, 1 hour is fine for high-sensitivity hosts. The CA key itself rotates much less often (annually, or after a suspected compromise). Plan a CA-rotation drill twice a year: a second CA key is added to TrustedUserCAKeys, certs issued by both CAs work, the old CA is decommissioned after all in-flight certs have expired. This is mostly a config-management exercise.

Revocation is a single-step on the CA - mark the cert serial revoked, and (with KRL files) hosts refuse it on next connection. Keep the KRL distribution under config management; a failed KRL push that allows a revoked cert to keep working is an accident waiting to happen.

Auditing and incident response

Two log streams matter. The CA logs every issuance with the requester, the principal set, the TTL and the serial - this is the canonical access record for the fleet. The host's auth.log records every accepted cert with its serial, principal and source IP. Joining them at the SIEM gives a complete picture: who, from where, to which host, with which cert, used how long. That join is the audit answer that authorized_keys never made possible.

For incident response, a single revocation entry plus the next config-management run cuts an attacker off the fleet. Compare to the authorized_keys world where you would have to track down every host where the compromised key landed - and then the next host you forgot.

Common pitfalls

  • Trusting * in the @cert-authority line. Scope the wildcard to your domains and IP ranges; never trust the CA for the whole internet.
  • Long-lived user certs. If you find yourself raising the TTL above 24 hours, you have a workflow problem (operators do not want to MFA again), not a security problem. Solve the workflow.
  • Storing the CA private key on the issuing host. Use an HSM, a Vault transit engine, or at minimum a dedicated host with its own access controls. The CA private key is the keys-to-the-kingdom.
  • Forgetting to set AuthorizedKeysFile /dev/null after retiring keys. If authorized_keys still works, operators will keep using it.
  • Not distributing the KRL. Revocation that does not reach the host is not revocation. Treat KRL distribution as critical-path automation.

Audit checklist

  1. Every host has a host cert and clients trust the host CA via @cert-authority (1 pt)
  2. User certs are issued via IdP/MFA with TTL <= 24 hours (1 pt)
  3. Principals are sourced from IdP groups, not hand-maintained lists (1 pt)
  4. CA private key lives in an HSM/Vault, not on the issuing host (1 pt)
  5. KRL distributed automatically; revocation drill performed at least quarterly (1 pt)

5/5 = PASS, 3-4 = WARN, <3 = FAIL.

FAQ

Does this require a new client?

No. OpenSSH 5.4+ on the client and server is enough. Modern OpenSSH 9.x is recommended for the strongest defaults.

What about Ansible / configuration management?

Ansible can use a service-account user cert issued at run time. Avoid embedding a long-lived key in CI; rotate Ansible's cert with the same TTL as a human's.

Can I keep break-glass keys?

Yes - kept in a vault, used only in declared emergencies, audited. Two break-glass accounts is the typical pattern.

Does this work for git over SSH?

Yes. The git server presents a host cert, the user presents a user cert; the workflow is identical to a normal shell session.

Do I need to retire ed25519/RSA host keys?

No. The cert is signed over the host key; the underlying key is unchanged. Your existing keys keep working.

Related Dargslan resources

Share this article:
Dargslan Editorial Team (Dargslan)
About the Author

Dargslan Editorial Team (Dargslan)

Collective of Software Developers, System Administrators, DevOps Engineers, and IT Authors

Dargslan is an independent technology publishing collective formed by experienced software developers, system administrators, and IT specialists.

The Dargslan editorial team works collaboratively to create practical, hands-on technology books focused on real-world use cases. Each publication is developed, reviewed, and...

Programming Languages Linux Administration Web Development Cybersecurity Networking

Stay Updated

Subscribe to our newsletter for the latest tutorials, tips, and exclusive offers.