🎁 New User? Get 20% off your first purchase with code NEWUSER20 Register Now →
Menu

Categories

DevOps Intermediate

What is Error Budget?

The acceptable amount of unreliability allowed for a service, calculated as 100% minus the Service Level Objective.

An error budget quantifies how much downtime or errors a service can tolerate. For example, a 99.9% availability SLO gives an error budget of 0.1%, translating to about 8.76 hours per year or 43.8 minutes per month. Teams can spend this budget on risky deployments, experiments, and new features. When the budget is depleted, the team shifts focus to reliability improvements. This framework turns the tension between development speed and stability into a data-driven conversation, removing subjective arguments about when to slow down or speed up releases.

Related Terms

Rolling Update
A deployment strategy that gradually replaces old application instances with new ones, maintaining availability throughout.
Grafana
An open-source analytics and visualization platform for creating dashboards from various data sources.
Chaos Engineering
The discipline of deliberately introducing failures into a system to test its resilience and identify weaknesses before they cause outages.
Observability
The ability to understand a system's internal state from its external outputs through metrics, logs, and traces.
Prometheus
An open-source monitoring and alerting toolkit that collects time-series metrics using a pull-based model.
Infrastructure Drift
The divergence between the actual state of infrastructure and its defined desired state, caused by manual changes or untracked modifications.
View All DevOps Terms →