🎁 New User? Get 20% off your first purchase with code NEWUSER20 Register Now →
Menu

Categories

DevOps Advanced

What is Chaos Engineering?

The discipline of deliberately introducing failures into a system to test its resilience and identify weaknesses before they cause outages.

Chaos engineering, pioneered by Netflix with Chaos Monkey, proactively tests system resilience. Experiments include killing servers, injecting network latency, filling disks, and simulating region failures. The goal is finding weaknesses before they cause real outages.

Principles include defining steady state, hypothesizing what will happen, running experiments in production, and minimizing blast radius. Tools include Chaos Monkey, Gremlin, and Litmus. Start small and gradually increase experiment scope.

Related Terms

Rolling Update
A deployment strategy that gradually replaces old application instances with new ones, maintaining availability throughout.
Infrastructure Drift
The divergence between the actual state of infrastructure and its defined desired state, caused by manual changes or untracked modifications.
Observability
The ability to understand a system's internal state from its external outputs through metrics, logs, and traces.
Istio
An open-source service mesh that provides traffic management, security, and observability for microservices on Kubernetes.
Makefile
A file containing build rules and commands that automates compilation and common project tasks using the make utility.
YAML
A human-readable data serialization language commonly used for configuration files in DevOps tools and applications.
View All DevOps Terms →