๐ŸŽ New User? Get 20% off your first purchase with code NEWUSER20 ยท โšก Instant download ยท ๐Ÿ”’ Secure checkout Register Now โ†’
Menu

Categories

Modern Bash Scripting in 2026: Strict Mode, Traps, and Production Patterns

Modern Bash Scripting in 2026: Strict Mode, Traps, and Production Patterns

Quick summary: Modern production Bash in 2026 looks very different from the casual shell scripts of a decade ago. Strict mode (set -euo pipefail with the right caveats), proper trap handlers for cleanup, structured logging, predictable exit codes, and basic test coverage with bats. None of these are new individually, but together they are the baseline for any script that runs unattended on a real server.

Modern Bash scripting strict mode traps and production patterns 2026

Why Bash Still Matters in 2026

Every few years someone declares Bash dead. The reality on the ground tells a different story: every Linux box you SSH into has it, every container's entrypoint is one, every CI pipeline's "run a quick script" step uses it, and every Ansible playbook calls out to a shell command somewhere. For tasks under a few hundred lines that mostly orchestrate other commands, Bash remains the most ubiquitous, most installed, and most cargo-culted tool in the toolbox.

That ubiquity is also the problem. Most Bash scripts in production were written in five minutes by someone solving an immediate problem, with no thought to what happens when the network is flaky, when a file is missing, when a subcommand fails partway through, or when the script is interrupted. Bringing the same engineering discipline you would apply to Python or Go to your Bash scripts is the single biggest quality lever available to most ops teams.

Strict Mode: The Setup Every Script Should Have

Almost every production Bash script should start with the same boilerplate:

#!/usr/bin/env bash
#
# script-name.sh โ€” one-line description
# Usage: script-name.sh [args]
#
set -Eeuo pipefail
IFS=$'\n\t'

Here is what each piece does and why it matters:

  • set -e โ€” exit immediately if any command exits non-zero. This is the most famous "strict mode" flag, and the one that surprises people the most. It does not behave as you expect inside if, while, ||, and && contexts (where the failure is expected). Read the bash manual section on it; the rules are subtle.
  • set -u โ€” exit if you reference an undefined variable. Catches typos like $flie instead of $file. Trade-off: you have to be explicit about defaults, e.g., "${LOG_LEVEL:-info}" instead of "$LOG_LEVEL".
  • set -o pipefail โ€” without this, the exit code of a pipeline is the exit code of the last command. With it, the pipeline fails if any command fails. curl โ€ฆ | jq โ€ฆ was failing silently for 20 years because of this default; pipefail fixes it.
  • set -E โ€” propagate ERR traps into functions and subshells. Required if you want a trap to fire when a function fails.
  • IFS=$'\n\t' โ€” restrict word-splitting to newlines and tabs. Prevents the classic "filename with spaces" bug.

The asterisk on set -e

Strict mode is not magic. The cases where set -e silently does nothing surprise people regularly:

# set -e does NOT exit here โ€” the failure is "handled" by &&
mkdir /opt/foo && echo "ok"

# set -e does NOT exit here โ€” failure is in a subshell context
( false; echo "still runs" )

# set -e does NOT exit here โ€” assignment to a local with command sub
my_func() {
    local x=$(false)   # exit code of false is masked by local
    echo "still runs"
}

For the function case, split the assignment from the declaration:

my_func() {
    local x
    x=$(false)   # NOW set -e fires
    echo "never runs"
}

If you take only one thing from this guide: knowing where set -e fails to fire is more useful than the strict mode incantation itself.

Traps: Cleanup That Actually Runs

The classic anti-pattern is the script that creates a temp directory and never deletes it on failure. Traps fix that:

#!/usr/bin/env bash
set -Eeuo pipefail

TMPDIR=$(mktemp -d)

cleanup() {
    local rc=$?
    rm -rf "$TMPDIR"
    exit $rc
}
trap cleanup EXIT

# ... script body uses $TMPDIR ...

The EXIT trap fires on normal exit, on errors, and on signals. The local rc=$? at the top preserves the exit code so you can return it after cleanup.

Distinguishing failure from success in cleanup

cleanup() {
    local rc=$?
    if [[ $rc -ne 0 ]]; then
        echo "ERROR: script failed with exit code $rc" >&2
        # Optional: collect debug info, dump state, etc.
    fi
    rm -rf "$TMPDIR"
    exit $rc
}
trap cleanup EXIT

Handling SIGINT and SIGTERM specifically

handle_sigint() {
    echo "Interrupted by user (Ctrl+C)" >&2
    exit 130
}

handle_sigterm() {
    echo "Received SIGTERM, shutting down gracefully" >&2
    # Stop child processes, drain queues, etc.
    exit 143
}

trap handle_sigint INT
trap handle_sigterm TERM

Conventional exit codes: 130 for Ctrl+C, 143 for SIGTERM, 137 for SIGKILL (which you cannot trap, but other code may infer from the exit code).

Structured Logging

"echo" is fine for tiny scripts. For anything that runs in production, you want consistent log levels, timestamps, and a single function for emitting messages so you can change format in one place.

readonly LOG_LEVEL="${LOG_LEVEL:-info}"

log() {
    local level=$1
    shift
    local levels="error warn info debug"
    local current_idx=$(echo "$levels" | tr ' ' '\n' | grep -n "^${LOG_LEVEL}$" | cut -d: -f1)
    local msg_idx=$(echo "$levels" | tr ' ' '\n' | grep -n "^${level}$" | cut -d: -f1)
    if [[ $msg_idx -le $current_idx ]]; then
        printf '%s [%s] %s\n' "$(date -Iseconds)" "${level^^}" "$*" >&2
    fi
}

log_info()  { log info  "$@"; }
log_warn()  { log warn  "$@"; }
log_error() { log error "$@"; }
log_debug() { log debug "$@"; }

For shipping logs to a central system (Loki, Splunk, ELK), emit one JSON object per line:

log_json() {
    local level=$1
    shift
    printf '{"ts":"%s","level":"%s","msg":"%s","script":"%s"}\n' \
        "$(date -Iseconds)" "$level" "$*" "$(basename "$0")" >&2
}

Argument Parsing That Does Not Hurt

Tiny scripts can use positional arguments. Anything more complex deserves getopts (POSIX, simple) or a small custom parser. Avoid GNU getopt in portable scripts; its behavior differs across systems.

usage() {
    cat <<EOF
Usage: ${0##*/} [OPTIONS] ARGS

Options:
  -h          Show this help
  -v          Verbose mode
  -o FILE     Output file (default: stdout)
EOF
}

verbose=0
output="-"

while getopts "hvo:" opt; do
    case $opt in
        h) usage; exit 0 ;;
        v) verbose=1 ;;
        o) output=$OPTARG ;;
        *) usage >&2; exit 64 ;;
    esac
done
shift $((OPTIND - 1))

Quoting and Word Splitting

The number-one source of Bash bugs in the wild is unquoted variable expansion. Get into the habit of always quoting:

# Wrong โ€” breaks on filenames with spaces
for f in $(ls /var/log); do
    cat $f
done

# Right โ€” handle filenames safely
for f in /var/log/*; do
    cat "$f"
done

Use shellcheck obsessively. It catches the unquoted-variable, useless-cat, and dozens of other patterns automatically. Run it in CI on every script:

shellcheck -x scripts/*.sh

Error Handling Patterns

Pattern 1: Required-command checks

require_command() {
    local cmd=$1
    if ! command -v "$cmd" >/dev/null 2>&1; then
        log_error "Required command not found: $cmd"
        exit 1
    fi
}

require_command jq
require_command curl
require_command openssl

Pattern 2: Retry with backoff

retry() {
    local max_attempts=$1
    local delay=$2
    shift 2
    local attempt=1
    until "$@"; do
        if [[ $attempt -ge $max_attempts ]]; then
            log_error "Command failed after $max_attempts attempts: $*"
            return 1
        fi
        log_warn "Attempt $attempt failed, retrying in ${delay}s..."
        sleep "$delay"
        delay=$((delay * 2))
        attempt=$((attempt + 1))
    done
}

retry 5 2 curl -sf https://api.example.com/health

Pattern 3: Lock files for single-instance scripts

LOCKFILE="/var/run/myscript.lock"

acquire_lock() {
    exec 9>"$LOCKFILE"
    if ! flock -n 9; then
        log_error "Another instance is already running"
        exit 1
    fi
}

acquire_lock
# ... rest of script ...

Testing Bash with bats

Yes, you should test your Bash scripts. The bats-core framework makes it pleasant:

#!/usr/bin/env bats

@test "script exits 0 on success" {
    run ./myscript.sh -o /tmp/output.txt
    [ "$status" -eq 0 ]
    [ -f /tmp/output.txt ]
}

@test "script fails when required command missing" {
    PATH="/usr/bin" run ./myscript.sh
    [ "$status" -eq 1 ]
    [[ "$output" == *"Required command not found"* ]]
}

Run with bats tests/. Integrate into CI alongside shellcheck.

What to Avoid

  • Parsing the output of ls โ€” use globs (for f in *.txt) or find.
  • Using eval โ€” almost always wrong, almost always exploitable. There is usually a better way.
  • Echoing into commands that need input โ€” use here-docs or here-strings instead.
  • Catching all signals into one trap โ€” different signals deserve different responses.
  • Hard-coding paths to commands โ€” except for security-critical scripts where you specifically want to avoid PATH manipulation.
  • Mixing tabs and spaces in heredocs with <<- โ€” only tabs are stripped; mixed indentation produces broken heredoc bodies.

When Bash Is Not the Right Tool

Bash is great for orchestrating other programs and gluing things together. It is not great for:

  • Scripts over ~500 lines โ€” maintainability degrades fast. Move to Python.
  • Anything with complex data structures โ€” Bash arrays are limited; associative arrays are awkward.
  • String manipulation beyond trivial โ€” sed, awk, or a real language are better.
  • Performance-critical work โ€” every fork is slow; loops with subshells add up.
  • Anything you want to ship cross-platform to non-Linux developers โ€” Bash quirks vary across macOS, BSD, Alpine, and Windows-via-WSL.

A Production-Ready Skeleton

#!/usr/bin/env bash
#
# deploy.sh โ€” deploy app to staging
# Usage: deploy.sh [-v] -e ENV -s SERVICE
#
set -Eeuo pipefail
IFS=$'\n\t'

readonly SCRIPT_NAME=$(basename "$0")
readonly LOG_LEVEL="${LOG_LEVEL:-info}"

TMPDIR=$(mktemp -d)
trap 'rc=$?; rm -rf "$TMPDIR"; exit $rc' EXIT
trap 'log_error "Interrupted"; exit 130' INT
trap 'log_error "Terminated"; exit 143' TERM

# ... log functions, require_command, retry helpers ...

main() {
    local environment="" service="" verbose=0
    while getopts "hve:s:" opt; do
        case $opt in
            h) usage; exit 0 ;;
            v) verbose=1 ;;
            e) environment=$OPTARG ;;
            s) service=$OPTARG ;;
            *) usage >&2; exit 64 ;;
        esac
    done

    [[ -z "$environment" || -z "$service" ]] && { usage >&2; exit 64; }

    require_command kubectl
    require_command helm

    log_info "Deploying $service to $environment"
    # ... actual work ...
    log_info "Deployment complete"
}

main "$@"

Real-World Anti-Patterns We Have Found in Production

Code review of a few hundred production Bash scripts surfaces the same anti-patterns over and over. Here are the worst offenders, in rough frequency order.

The infinite-retry curl. Someone wrote while ! curl -sf https://api.example.com; do sleep 1; done as a "wait for service" check. When the upstream service is genuinely down, this script runs forever, paging nobody. Always cap retries with a reasonable maximum and exit non-zero on exhaustion.

The "rm -rf with a variable" footgun. The classic rm -rf "$dir/*" where $dir is unset becomes rm -rf /*. Strict mode (set -u) catches this. Defensive coding (rm -rf "${dir:?}/"*) catches it even without strict mode. The combination of both is the only safe pattern.

The silent pipe failure. A backup script ends with tar czf - /data | aws s3 cp - s3://backups/today.tar.gz. The tar fails because /data does not exist; aws s3 cp succeeds with an empty body; the script exits 0; nobody notices for a month until restore is needed. Without set -o pipefail, this is the default behavior of every shell pipeline. With it, the script fails loudly the first time it breaks.

The "I will fix the quotes later" technique. Scripts written without quoting variables work fine on the developer's laptop where filenames have no spaces, then break in production where someone created a directory called "Invoice 2024". Quoting is not optional; it is the default. Your editor or your shellcheck pre-commit hook should make unquoted variables visually loud.

The unsanitized user input in eval. A web admin tool that runs Bash with parameters from an HTTP form. The eval-based command construction is genuinely terrifying when you read the code. The fix is "do not do that" โ€” pass arguments as separate array elements, not a single concatenated string.

Frequently Asked Questions

Should I use bash or sh?

Bash for everything that does not need to run on Alpine or BusyBox. POSIX sh for container init scripts and embedded systems where bash is not installed. Be explicit in the shebang either way.

What about zsh or fish?

Excellent interactive shells; not recommended as scripting targets because their syntax differs from POSIX/Bash and they are not universally installed.

Is shellcheck really worth it?

Yes. It catches more bugs in five seconds than most code reviewers will catch in five minutes.

Do I need to worry about Bash 3 vs Bash 5?

Only if your script needs to run on macOS (which still ships Bash 3.2 by default) or very old enterprise Linux. For modern Linux servers, Bash 5+ is the safe assumption.

How do I handle secrets in Bash scripts?

Read from a file or environment variable; never hard-code. Avoid printing secrets to logs (be careful with set -x). For complex secret management, consider calling out to vault, aws secretsmanager, or similar.

Further Reading from the Dargslan Library

The Bottom Line

Modern Bash is not glamorous, but it is everywhere โ€” and treating it like a real language pays off every time a script runs unattended at 3 AM. Start every new script with the strict-mode boilerplate, add a cleanup trap before you write any code that creates state, run shellcheck on every commit, and write at least a couple of bats tests for the critical paths. Do that and your shell scripts will be more reliable than half the production Python in your codebase.

Share this article:
Dorian Thorne
About the Author

Dorian Thorne

Cloud Infrastructure, Cloud Architecture, Infrastructure Automation, Technical Documentation

Dorian Thorne is a cloud infrastructure specialist and technical author focused on the design, deployment, and operation of scalable cloud-based systems.

He has extensive experience working with cloud platforms and modern infrastructure practices, including virtualized environments, cloud networking, identity and acces...

Cloud Computing Cloud Networking Identity and Access Management Infrastructure as Code System Reliability

Stay Updated

Subscribe to our newsletter for the latest tutorials, tips, and exclusive offers.