🎁 New User? Get 20% off your first purchase with code NEWUSER20 Register Now →
Menu

Categories

DevOps Advanced

What is Chaos Engineering?

The discipline of deliberately introducing failures into a system to test its resilience and identify weaknesses before they cause outages.

Chaos engineering, pioneered by Netflix with Chaos Monkey, proactively tests system resilience. Experiments include killing servers, injecting network latency, filling disks, and simulating region failures. The goal is finding weaknesses before they cause real outages.

Principles include defining steady state, hypothesizing what will happen, running experiments in production, and minimizing blast radius. Tools include Chaos Monkey, Gremlin, and Litmus. Start small and gradually increase experiment scope.

Related Terms

Prometheus
An open-source monitoring and alerting toolkit that collects time-series metrics using a pull-based model.
Container Orchestration
The automated management of containerized applications including deployment, scaling, networking, and health monitoring across clusters.
Makefile
A file containing build rules and commands that automates compilation and common project tasks using the make utility.
Docker Volume
A mechanism for persisting data generated by Docker containers, surviving container restarts and removals.
SonarQube
A platform for continuous code quality inspection that detects bugs, vulnerabilities, and code smells through static analysis.
Feature Flag
A technique that allows enabling or disabling features in production without deploying new code, enabling safe rollouts and A/B testing.
View All DevOps Terms →