🎁 New User? Get 20% off your first purchase with code NEWUSER20 Register Now →
Menu

Categories

Git Repository Security Audit with Python: Scan for Secrets, Large Files, and Leaks (Free CLI Tool)

Git Repository Security Audit with Python: Scan for Secrets, Large Files, and Leaks (Free CLI Tool)

Why Git Security Auditing is Essential

Git repositories are the most common source of accidental secret exposure. AWS access keys, database passwords, API tokens, private keys, and service account credentials are regularly committed to repositories — sometimes intentionally during development and forgotten, sometimes by developers who do not understand that Git history is permanent.

The consequences are severe: exposed AWS keys can result in thousands of dollars of cryptocurrency mining charges within hours. Leaked database credentials give attackers direct access to your production data. Committed private keys compromise your entire SSL/TLS infrastructure.

According to GitGuardian, over 10 million new secrets were detected in public GitHub repositories in 2023 alone. And for every secret found in public repos, many more exist in private repositories where teams assume their code is safe.

dargslan-git-audit scans your repository for 10+ categories of secrets using regex patterns, checks .gitignore coverage for sensitive file types, finds large files that bloat your repository, and identifies tracked sensitive files that should never be in version control.

Install dargslan-git-audit

pip install dargslan-git-audit

Zero dependencies. Uses Git CLI and Python standard library. Works with any Git repository.

CLI Usage

# Full security report
dargslan-git report

# Scan for secrets in tracked files
dargslan-git secrets

# Check for tracked sensitive files
dargslan-git sensitive

# Audit .gitignore coverage
dargslan-git gitignore

# Find large files in history
dargslan-git large

# Repository statistics
dargslan-git stats

# Scan specific repository
dargslan-git report -p /path/to/repo

# JSON output
dargslan-git json

Python API

from dargslan_git_audit import GitAudit

ga = GitAudit()  # current directory
# ga = GitAudit(repo_path="/path/to/repo")

# Scan for secrets
secrets = ga.scan_working_secrets()
for s in secrets:
    print(f"[{s[\"severity\"]}] {s[\"file\"]}: {s[\"type\"]} ({s[\"preview\"]})")

# Check sensitive files
sensitive = ga.check_sensitive_files()
for f in sensitive:
    print(f"Tracked: {f[\"file\"]} (matches: {f[\"pattern\"]})")

# Audit .gitignore
for gap in ga.check_gitignore():
    if gap["tracked"]:
        print(f"RISK: {gap[\"pattern\"]} not in .gitignore and tracked!")

# Find large files (>10MB)
for f in ga.find_large_files(threshold_mb=10):
    print(f"{f[\"size_mb\"]:.1f} MB: {f[\"path\"]}")

# Full audit
issues = ga.audit()
for i in issues:
    print(f"[{i[\"severity\"]}] {i[\"message\"]}")

Secret Patterns Detected

  • Passwords — password/passwd/pwd assignments with 8+ character values
  • API Keys — api_key, apikey patterns with 16+ character values
  • AWS Access Keys — AKIA prefix followed by 16 uppercase alphanumeric characters
  • GitHub Tokens — ghp_, gho_, ghu_, ghs_, ghr_ prefixed tokens
  • OpenAI Keys — sk- prefix with 48+ characters
  • Private Keys — PEM-encoded RSA, DSA, EC, and generic private keys
  • Database URLs — mysql://, postgres://, mongodb:// with embedded credentials
  • Bearer Tokens — Bearer authentication headers with token values
  • Secret Keys — secret_key, secretkey assignments
  • Access Tokens — access_token, auth_token patterns

Pre-commit Integration

import sys
from dargslan_git_audit import GitAudit

ga = GitAudit()
staged_secrets = ga.scan_staged_secrets()

if staged_secrets:
    print("BLOCKED: Secrets detected in staged files!")
    for s in staged_secrets:
        print(f"  {s[\"type\"]}: {s[\"preview\"]}")
    sys.exit(1)

Download the Git Security Cheat Sheet

Get our Git Security Audit Cheat Sheet — covering secret patterns, .gitignore essentials, sensitive file detection, and cleanup commands.

Related Tools

Explore all security Python tools at dargslan.com. Our DevSecOps eBooks cover Git security, CI/CD pipeline hardening, and secret management.

Share this article:
Dargslan Editorial Team (Dargslan)
About the Author

Dargslan Editorial Team (Dargslan)

Collective of Software Developers, System Administrators, DevOps Engineers, and IT Authors

Dargslan is an independent technology publishing collective formed by experienced software developers, system administrators, and IT specialists.

The Dargslan editorial team works collaboratively to create practical, hands-on technology books focused on real-world use cases. Each publication is developed, reviewed, and...

Programming Languages Linux Administration Web Development Cybersecurity Networking

Stay Updated

Subscribe to our newsletter for the latest tutorials, tips, and exclusive offers.