🎁 New User? Get 20% off your first purchase with code NEWUSER20 Register Now →
Menu

Categories

DevOps Intermediate

What is SRE (Site Reliability Engineering)?

An engineering discipline that applies software engineering principles to infrastructure and operations to create reliable systems.

Site Reliability Engineering, pioneered by Google, treats operations as a software problem. SRE teams define Service Level Objectives (SLOs), measure them through Service Level Indicators (SLIs), and use error budgets to balance reliability with feature velocity. When the error budget is exhausted, teams prioritize reliability work over new features. Key practices include toil automation (eliminating repetitive manual work), blameless postmortems, capacity planning, and progressive rollouts. SRE bridges the gap between development speed and operational stability.

Related Terms

Microservices
An architectural style where an application is composed of small, independent services that communicate over APIs.
Immutable Deployment
A deployment strategy where new versions replace existing instances entirely rather than updating them in place.
Environment Variable
A dynamic value stored outside the application code that configures behavior without hardcoding sensitive or environment-specific data.
Blue-Green Deployment
A deployment strategy using two identical environments where traffic is switched from the old version to the new one instantly.
Rolling Update
A deployment strategy that gradually replaces old application instances with new ones, maintaining availability throughout.
Ansible
An agentless automation tool for configuration management, application deployment, and task automation using YAML playbooks.
View All DevOps Terms →