Linux control groups (cgroups) are the backbone of container resource management. Whether you are running Docker containers, Kubernetes pods, or systemd services, cgroups enforce CPU, memory, and I/O limits. But monitoring these limits and detecting when resources are exhausted requires dedicated tooling.
In this guide, we will build and use dargslan-cgroup-monitor β a free, zero-dependency Python CLI tool that monitors cgroup resource usage across both cgroups v1 and v2. Install it with a single command and get instant visibility into your container and service resource consumption.
What Are Linux Cgroups?
Control groups (cgroups) are a Linux kernel feature that organizes processes into hierarchical groups and applies resource limits. They are used by Docker, Podman, LXC, and systemd to isolate workloads. Cgroups v1 uses separate hierarchies per controller (cpu, memory, blkio), while cgroups v2 unifies everything under a single hierarchy at /sys/fs/cgroup.
Quick Start: Install dargslan-cgroup-monitor
pip install dargslan-cgroup-monitor
After installation, the dargslan-cgroup command is available system-wide:
dargslan-cgroup report # Full resource report
dargslan-cgroup list # List all active cgroups
dargslan-cgroup slices # System slices and services
dargslan-cgroup containers # Container cgroups only
dargslan-cgroup issues # Resource limit issues
dargslan-cgroup json # JSON output for scripting
Understanding Cgroup Resource Limits
Each cgroup can have several resource controllers applied:
- CPU: Limits CPU time via cpu.max (v2) or cpu.cfs_quota_us (v1). A value of "100000 100000" means 100% of one core.
- Memory: Hard limit via memory.max (v2) or memory.limit_in_bytes (v1). Exceeding this triggers the OOM killer.
- I/O: Bandwidth limits via io.max (v2) or blkio.throttle.* (v1). Controls read/write rates per device.
- PIDs: Maximum process count via pids.max. Prevents fork bombs within a cgroup.
Monitoring Cgroups v2 (Modern Systems)
On systems running cgroups v2 (Ubuntu 22.04+, Fedora 38+, RHEL 9+), the unified hierarchy lives at /sys/fs/cgroup. Our tool reads key files from each cgroup directory:
# Memory usage and limits
/sys/fs/cgroup/system.slice/docker.service/memory.current
/sys/fs/cgroup/system.slice/docker.service/memory.max
# CPU statistics
/sys/fs/cgroup/system.slice/docker.service/cpu.stat
# Process list
/sys/fs/cgroup/system.slice/docker.service/cgroup.procs
Monitoring Cgroups v1 (Legacy Systems)
Legacy systems use separate controller hierarchies. The tool walks each controller directory:
/sys/fs/cgroup/memory/docker/container-id/memory.usage_in_bytes
/sys/fs/cgroup/memory/docker/container-id/memory.limit_in_bytes
/sys/fs/cgroup/cpu/docker/container-id/cpu.cfs_quota_us
Using the Python API
from dargslan_cgroup_monitor import CgroupMonitor
cm = CgroupMonitor()
print(f"Cgroup version: {cm.version}")
# List all cgroups with resource usage
for cg in cm.list_cgroups():
print(f"{cg['path']}: {cg.get('memory_human', 'N/A')} "
f"({cg.get('memory_percent', 'N/A')}%)")
# Container-specific cgroups
containers = cm.get_container_cgroups()
for c in containers:
print(f"Container: {c['path']} using {c.get('memory_human')}")
# Audit for issues
issues = cm.audit()
for issue in issues:
print(f"[{issue['severity']}] {issue['message']}")
Detecting Container Memory Exhaustion
One of the most critical monitoring tasks is detecting when containers approach their memory limits. When a container reaches its cgroup memory limit, the Linux OOM killer terminates processes inside the container. The audit() method checks for this:
- Critical: Memory usage above 90% of limit
- Warning: Memory usage above 75% of limit
- Info: Container running without a memory limit set
Automating Cgroup Monitoring with Cron
# Run every 5 minutes, save to log
*/5 * * * * dargslan-cgroup issues >> /var/log/cgroup-issues.log 2>&1
# JSON output for monitoring stack integration
*/5 * * * * dargslan-cgroup json > /tmp/cgroup-status.json
Integration with Monitoring Stacks
The JSON output mode makes it easy to feed cgroup data into monitoring systems like Prometheus (via node_exporter textfile collector), Grafana, or ELK stack. Write a simple wrapper that runs dargslan-cgroup json and pushes metrics to your preferred backend.
Systemd Slice Monitoring
Systemd organizes services into slices (user.slice, system.slice, machine.slice). Each slice is a cgroup that can have resource limits. Use dargslan-cgroup slices to see which system services are consuming the most resources and whether any slices are approaching their limits.
Best Practices for Cgroup Resource Management
- Always set memory limits on production containers to prevent runaway processes from consuming all host memory
- Monitor CPU throttling β containers hitting their CPU quota will have increased latency
- Use the audit feature regularly to catch containers running without limits
- Set up alerting when any cgroup exceeds 80% of its memory limit
- Review cgroup hierarchy after system updates, as kernel upgrades may change cgroup behavior
Conclusion
Monitoring cgroup resource usage is essential for maintaining healthy containerized environments. The dargslan-cgroup-monitor tool gives you instant visibility into CPU, memory, and I/O limits across all cgroups on your system. Install it today and start catching resource exhaustion before it causes outages.
For more Linux system administration tools, visit dargslan.com and explore our collection of eBooks and free cheat sheets.