🎁 New User? Get 20% off your first purchase with code NEWUSER20 Register Now β†’
Menu

Categories

Linux Disk Usage Analysis & Cleanup: Find Large Files with Python (2026)

Linux Disk Usage Analysis & Cleanup: Find Large Files with Python (2026)

Running out of disk space is one of the most common β€” and most preventable β€” server emergencies. Databases crash, applications can't write logs, deployments fail, and users get cryptic error messages. Yet many sysadmins only investigate disk usage after the problem hits.

This guide shows you how to proactively monitor and manage disk usage on Linux servers using dargslan-disk-cleaner, a free Python tool that finds large files, analyzes directory sizes, scans temp directories, and identifies cleanup opportunities.

The Disk Space Problem

Servers accumulate data relentlessly. Log files grow, Docker images pile up, package caches fill, old backups linger, and temp files nobody remembers accumulate in forgotten directories. A server that was provisioned with plenty of storage can fill up in weeks if nobody watches.

The traditional approach β€” running df -h and du -sh manually β€” works for one server. But when you manage 10, 50, or 100 servers, you need automation.

Installing dargslan-disk-cleaner

# Install standalone
pip install dargslan-disk-cleaner

# Or install the complete 15-tool sysadmin toolkit
pip install dargslan-toolkit

Quick Start: CLI Commands

# Full disk usage report with visual bars
dargslan-disk report

# Find files larger than 100MB (default)
dargslan-disk large /

# Find files larger than 50MB in /home
dargslan-disk large /home -m 50

# Find files older than 90 days
dargslan-disk old /var/log -d 90

# Show directory sizes
dargslan-disk dirs /

# Show temp directory usage
dargslan-disk temp

# JSON output for scripting
dargslan-disk json

Understanding the Report

The dargslan-disk report command gives you an instant overview of all mounted filesystems with visual usage bars. Filesystems above 85% are flagged with a warning indicator, making it easy to spot problems at a glance.

============================================================
  Disk Usage Report
============================================================
  /
    [#######################-------] 77.3%
    Used: 38.7 GB / 50.0 GB (Free: 11.3 GB)
  /boot
    [###############---------------] 49.1%
    Used: 245.5 MB / 500.0 MB (Free: 254.5 MB)

  Temp Directories:
    /tmp: 156.2 MB (43 files)
    /var/tmp: 12.4 MB (8 files)
============================================================

Python API for Custom Workflows

from dargslan_disk_cleaner import DiskCleaner

dc = DiskCleaner()

# Get filesystem usage
for fs in dc.disk_usage():
    if fs['percent_used'] > 85:
        print(f"WARNING: {fs['mount']} at {fs['percent_used']}%")

# Find large files
large_files = dc.find_large_files("/var", min_size_mb=50)
for f in large_files:
    print(f"  {f['size_human']:>10}  {f['path']}")

# Find old files that could be cleaned
old_files = dc.find_old_files("/var/log", days=90)
total_reclaimable = sum(f['size'] for f in old_files)
print(f"Reclaimable: {total_reclaimable / 1024**2:.0f} MB")

# Scan directory sizes at depth 1
for d in dc.dir_sizes("/var"):
    print(f"  {d['size_human']:>10}  {d['path']}")

# Check temp directories
for t in dc.temp_usage():
    print(f"  {t['directory']}: {t['size_human']} ({t['file_count']} files)")

# Find potential duplicate files by size
dupes = dc.find_duplicates("/home", min_size_mb=5)
for size, files in dupes.items():
    print(f"  {size}: {len(files)} potential duplicates")

Common Disk Space Hogs on Linux

1. Log Files

The /var/log directory is the #1 space consumer on most servers. Application logs, system logs, and authentication logs grow continuously.

# Check log sizes
du -sh /var/log/* | sort -rh | head -10

# Clean old compressed logs
find /var/log -name "*.gz" -mtime +30 -delete

# Truncate active log (use with caution)
truncate -s 0 /var/log/syslog

2. Docker Images and Volumes

# Check Docker disk usage
docker system df

# Remove unused images, containers, volumes
docker system prune -a --volumes

# Remove dangling images only
docker image prune

3. Package Cache

# Debian/Ubuntu
apt clean
apt autoremove

# RHEL/CentOS
yum clean all
dnf clean all

4. Journal Logs

# Check journal size
journalctl --disk-usage

# Limit to 100MB
journalctl --vacuum-size=100M

# Keep only last 7 days
journalctl --vacuum-time=7d

5. Old Kernels

# Ubuntu - remove old kernels
apt autoremove --purge

# List installed kernels
dpkg -l | grep linux-image

Automated Cleanup Script

#!/bin/bash
# /opt/scripts/disk-cleanup.sh
# Run weekly via cron

echo "=== Disk Cleanup $(date) ==="

# Clean package cache
apt clean 2>/dev/null

# Trim journal
journalctl --vacuum-time=7d

# Remove old temp files
find /tmp -type f -mtime +7 -delete
find /var/tmp -type f -mtime +30 -delete

# Remove old compressed logs
find /var/log -name "*.gz" -mtime +30 -delete

# Docker cleanup (if installed)
if command -v docker &> /dev/null; then
    docker system prune -f
fi

# Report current usage
df -h /
echo "=== Cleanup Complete ==="

Monitoring with Cron

#!/usr/bin/env python3
# /opt/scripts/disk-alert.py
from dargslan_disk_cleaner import DiskCleaner

dc = DiskCleaner()
for fs in dc.disk_usage():
    if fs['percent_used'] > 85:
        print(f"ALERT: {fs['mount']} is {fs['percent_used']}% full!")
        print(f"  Free: {fs['free_human']}")
        # Add email/Slack notification here
# Cron: check disk every 4 hours
0 */4 * * * /usr/bin/python3 /opt/scripts/disk-alert.py >> /var/log/disk-alert.log 2>&1

πŸ“€ Master Linux Storage Management

Our Linux administration eBooks cover filesystem management, LVM, RAID, disk encryption, and storage performance tuning for production environments.

Browse Linux Books β†’

Disk space management is fundamental to server reliability. With dargslan-disk-cleaner, you get automated disk analysis without installing heavy monitoring agents. Combine it with cron jobs and alerting for a complete disk management solution.

Install now: pip install dargslan-disk-cleaner β€” or get all 15 tools with pip install dargslan-toolkit

Download our free Disk Usage & Cleaner Cheat Sheet for quick reference.

Share this article:
Dargslan Editorial Team (Dargslan)
About the Author

Dargslan Editorial Team (Dargslan)

Collective of Software Developers, System Administrators, DevOps Engineers, and IT Authors

Dargslan is an independent technology publishing collective formed by experienced software developers, system administrators, and IT specialists.

The Dargslan editorial team works collaboratively to create practical, hands-on technology books focused on real-world use cases. Each publication is developed, reviewed, and...

Programming Languages Linux Administration Web Development Cybersecurity Networking

Stay Updated

Subscribe to our newsletter for the latest tutorials, tips, and exclusive offers.