🎁 New User? Get 20% off your first purchase with code NEWUSER20 Register Now →
Menu

Categories

DevOps Advanced

What is Chaos Engineering?

The discipline of deliberately introducing failures into a system to test its resilience and identify weaknesses before they cause outages.

Chaos engineering, pioneered by Netflix with Chaos Monkey, proactively tests system resilience. Experiments include killing servers, injecting network latency, filling disks, and simulating region failures. The goal is finding weaknesses before they cause real outages.

Principles include defining steady state, hypothesizing what will happen, running experiments in production, and minimizing blast radius. Tools include Chaos Monkey, Gremlin, and Litmus. Start small and gradually increase experiment scope.

Related Terms

Containerization
A lightweight virtualization method that packages applications with their dependencies into isolated, portable containers.
Runbook
A documented set of standardized procedures for handling routine operations and incident response in production systems.
Ansible
An agentless automation tool for configuration management, application deployment, and task automation using YAML playbooks.
ELK Stack
A popular log management platform combining Elasticsearch (search), Logstash (processing), and Kibana (visualization).
Docker Volume
A mechanism for persisting data generated by Docker containers, surviving container restarts and removals.
Feature Flag
A technique that allows enabling or disabling features in production without deploying new code, enabling safe rollouts and A/B testing.
View All DevOps Terms →