Error Handling in Bash Scripts | DevOps Scripting Guide

Transform fragile Bash scripts into robust production tools with professional error handling. Learn how to detect, handle, and recover from errors in your DevOps automation scripts.

Exit Codes

Understand command success/failure

$? variable, true/false commands

Custom exit codes (0-255)

command; echo $?; exit 1

Trap Commands

Handle signals and errors

Cleanup on exit, interrupt handling

ERR, EXIT, DEBUG traps

trap cleanup EXIT

Debugging

Find and fix errors

set -x, debug logging, stack traces

Validation and testing

set -x; bash -x script.sh

Error Occurs

→

Detect Error

→

Handle Error

→

Cleanup

→

Exit Gracefully

Part 1: Understanding Exit Codes

What Are Exit Codes?

Every command in Linux returns an exit code when it finishes:

Exit Code Meaning Example Command Typical Use 0 Success ls /tmp Normal successful completion 1 General error ls /nonexistent Catch-all for failures 2 Misuse of shell builtins empty_function() {} Shell syntax errors 126 Command cannot execute /dev/null as command Permission denied 127 Command not found nonexistent_command Typo or missing program 130 Script terminated by Ctrl+C Press Ctrl+C during script SIGINT (2) + 128 = 130 255* Exit status out of range exit 300 Wraps around (300-256=44)

Pro Tip: Exit codes 0-255 are valid. Codes 1-125 are for user-defined errors. Codes 126+ are reserved for system use. Use codes 1-125 for your custom errors with specific meanings.

Checking Exit Codes

#!/bin/bash

# Method 1: Check $? immediately
ls /tmp
echo "Exit code: $?"  # Should be 0

ls /nonexistent
echo "Exit code: $?"  # Should be 1

# Method 2: Using if statement directly
if ls /tmp; then
    echo "Command succeeded"
else
    echo "Command failed with code: $?"
fi

# Method 3: Store and check later
ls /tmp
LS_EXIT=$?

if [ $LS_EXIT -eq 0 ]; then
    echo "ls succeeded"
elif [ $LS_EXIT -eq 1 ]; then
    echo "ls failed - file not found"
elif [ $LS_EXIT -eq 2 ]; then
    echo "ls failed - syntax error"
else
    echo "ls failed with unknown code: $LS_EXIT"
fi

# Method 4: Test commands without running
if command -v git &> /dev/null; then
    echo "git is installed"
else
    echo "git is not installed"
    exit 1
fi

# Method 5: Check multiple commands
mkdir -p /tmp/test && cd /tmp/test && echo "Success"
# If any command fails, stops due to &&

# Method 6: Continue despite errors
mkdir -p /tmp/test || echo "mkdir failed, continuing..."
cd /tmp/test || exit 1

# Method 7: Complex logic
create_backup() {
    local src=$1
    local dest=$2
    
    if cp -r "$src" "$dest"; then
        echo "Backup successful"
        return 0
    else
        local exit_code=$?
        echo "Backup failed with code: $exit_code"
        
        case $exit_code in
            1) echo "File not found" ;;
            2) echo "Permission denied" ;;
            13) echo "Not a directory" ;;
            *) echo "Unknown error" ;;
        esac
        
        return $exit_code
    fi
}

Custom Exit Codes

#!/bin/bash

# Define meaningful exit codes
readonly EXIT_SUCCESS=0
readonly EXIT_GENERAL_ERROR=1
readonly EXIT_MISSING_DEPENDENCY=2
readonly EXIT_INVALID_ARGUMENT=3
readonly EXIT_FILE_NOT_FOUND=4
readonly EXIT_PERMISSION_DENIED=5
readonly EXIT_CONFIG_ERROR=6
readonly EXIT_NETWORK_ERROR=7
readonly EXIT_TIMEOUT=8
readonly EXIT_RESOURCE_EXHAUSTED=9

# Use custom exit codes
function check_dependencies() {
    local missing_deps=()
    
    for dep in "$@"; do
        if ! command -v "$dep" &> /dev/null; then
            missing_deps+=("$dep")
        fi
    done
    
    if [ ${#missing_deps[@]} -gt 0 ]; then
        echo "ERROR: Missing dependencies: ${missing_deps[*]}"
        return $EXIT_MISSING_DEPENDENCY
    fi
    
    return $EXIT_SUCCESS
}

function validate_arguments() {
    if [ $# -lt 2 ]; then
        echo "ERROR: Usage: $0  "
        return $EXIT_INVALID_ARGUMENT
    fi
    
    local source=$1
    local destination=$2
    
    if [ ! -e "$source" ]; then
        echo "ERROR: Source not found: $source"
        return $EXIT_FILE_NOT_FOUND
    fi
    
    if [ ! -w "$(dirname "$destination")" ]; then
        echo "ERROR: Cannot write to destination directory"
        return $EXIT_PERMISSION_DENIED
    fi
    
    return $EXIT_SUCCESS
}

# Main script with proper error handling
main() {
    local source_dir=$1
    local backup_dir=$2
    
    # Check dependencies
    check_dependencies rsync tar gzip || {
        local exit_code=$?
        echo "Dependency check failed"
        exit $exit_code
    }
    
    # Validate arguments
    validate_arguments "$source_dir" "$backup_dir" || {
        local exit_code=$?
        echo "Argument validation failed"
        exit $exit_code
    }
    
    # Perform backup
    if ! rsync -av "$source_dir/" "$backup_dir/"; then
        echo "ERROR: Backup failed"
        exit $EXIT_GENERAL_ERROR
    fi
    
    echo "Backup completed successfully"
    exit $EXIT_SUCCESS
}

# Call main with arguments
main "$@"

Part 2: Error Handling with set Options

The Essential set Options

Option What it does When to use Example set -e Exit immediately on error Production scripts

#!/bin/bash
set -e

set -u Treat unset variables as error All scripts (catch typos)

set -u
echo $UNSET_VAR # Error

set -o pipefail Pipeline fails if any command fails Pipelines with multiple commands cmd1 | cmd2 | cmd3 set -x Print commands before executing Debugging only

set -x
echo "Debug"
set +x

set -E ERR trap inherited in functions Complex scripts with functions

set -E
trap handler ERR

Practical Usage of set Options

#!/bin/bash

# Best practice: Start all production scripts with these
set -euo pipefail

# Example 1: set -e (exit on error)
echo "Starting script with set -e"

# This will cause script to exit
# non_existent_command

# But we can handle expected errors
if ! non_existent_command 2>/dev/null; then
    echo "Command failed as expected, continuing..."
fi

# Example 2: set -u (unset variables error)
echo "Testing set -u"

defined_var="I exist"
echo "Defined: $defined_var"

# This would cause error with set -u
# echo "Undefined: $undefined_var"

# Safe way to handle possibly undefined variables
echo "Safe check: ${undefined_var:-not set}"

# Example 3: set -o pipefail
echo "Testing pipefail"

# Without pipefail: returns 0 (success of last command)
false | true
echo "Without pipefail: $?"  # 0

# With pipefail: returns 1 (failure of false)
set -o pipefail
false | true
echo "With pipefail: $?"  # 1

# Example 4: Combining all options
strict_mode() {
    set -euo pipefail
    
    echo "In strict mode"
    
    # All errors are caught
    local required_var=${1:?"Missing required argument"}
    echo "Required: $required_var"
    
    # Pipe failures are caught
    echo "test" | grep "pattern" | wc -l
    
    echo "Strict mode completed"
}

# Example 5: Temporarily disabling set -e
echo "Temporarily disabling set -e"

set +e  # Disable exit on error
non_existent_command
echo "Command failed but script continues: $?"
set -e  # Re-enable exit on error

# Example 6: Handling commands that might fail
backup_database() {
    local db_name=$1
    
    # Use command substitution with set +e
    set +e
    local output=$(pg_dump "$db_name" 2>&1)
    local exit_code=$?
    set -e
    
    if [ $exit_code -ne 0 ]; then
        echo "ERROR: Database backup failed: $output"
        return $exit_code
    fi
    
    echo "$output" > "/backups/$db_name.sql"
    echo "Backup successful"
}

# Example 7: Using set -E with traps
cleanup() {
    echo "Cleaning up..."
    rm -f /tmp/tempfile.$$
}

error_handler() {
    local exit_code=$?
    echo "ERROR: Script failed with code $exit_code"
    echo "Line: $LINENO, Command: $BASH_COMMAND"
    cleanup
    exit $exit_code
}

set -E  # ERR trap is inherited
trap error_handler ERR
trap cleanup EXIT

# Now errors are caught by error_handler
# non_existent_command

Important: Be careful with set -e in complex scripts. Some commands are expected to fail (like grep returning 1 for no match). Use conditional execution or temporarily disable set -e for expected failures.

Part 3: Trap Command for Signal Handling

Common Linux Signals

SIGHUP

Hangup (terminal closed)

SIGINT

Interrupt (Ctrl+C)

SIGQUIT

Quit (Ctrl+\)

SIGKILL

Kill (cannot be caught)

SIGTERM

Terminate (default kill)

EXIT

Script exit (not a real signal)

ERR

Command fails (Bash specific)

DEBUG

Before each command

Using trap for Clean Error Handling

#!/bin/bash
# trap-examples.sh

set -euo pipefail

# Example 1: Basic cleanup on exit
cleanup() {
    echo "Cleaning up temporary files..."
    rm -f /tmp/myscript.*
    echo "Cleanup complete"
}

trap cleanup EXIT

# Create temporary file
TEMP_FILE=$(mktemp /tmp/myscript.XXXXXX)
echo "Created: $TEMP_FILE"

# Example 2: Handle Ctrl+C (SIGINT)
interrupt_handler() {
    echo ""
    echo "Script interrupted by user (Ctrl+C)"
    cleanup
    exit 130  # 128 + SIGINT(2)
}

trap interrupt_handler INT

# Example 3: Handle any error
error_handler() {
    local exit_code=$?
    echo "ERROR: Script failed at line $LINENO"
    echo "Command: $BASH_COMMAND"
    echo "Exit code: $exit_code"
    
    # Send alert (simulated)
    echo "ALERT: Script failed with code $exit_code" >&2
    
    cleanup
    exit $exit_code
}

trap error_handler ERR

# Example 4: Multiple signals
handle_termination() {
    local signal=$1
    echo "Received signal $signal"
    cleanup
    exit $((128 + signal))
}

trap 'handle_termination 1' HUP    # SIGHUP
trap 'handle_termination 2' INT    # SIGINT
trap 'handle_termination 15' TERM  # SIGTERM

# Example 5: DEBUG trap for logging
debug_handler() {
    # Don't trace trap commands to avoid infinite loop
    if [[ $BASH_COMMAND != "trap debug_handler DEBUG" ]]; then
        echo "[DEBUG] Line $LINENO: $BASH_COMMAND" >&2
    fi
}

# Enable debug tracing only if DEBUG env var is set
if [[ ${DEBUG:-false} == true ]]; then
    trap debug_handler DEBUG
fi

# Example 6: Function-specific traps
process_data() {
    local input_file=$1
    local output_file=$2
    
    # Local cleanup for this function
    local local_cleanup() {
        echo "Cleaning function resources..."
        rm -f "$output_file.tmp"
    }
    
    # Trap EXIT within function scope
    trap local_cleanup EXIT
    
    # Process data
    cp "$input_file" "$output_file.tmp"
    grep "important" "$output_file.tmp" > "$output_file"
    
    # Remove local trap
    trap - EXIT
    local_cleanup
}

# Example 7: Ignore signals temporarily
ignore_interrupts() {
    echo "Starting critical section (ignoring interrupts)"
    
    # Ignore interrupts
    trap '' INT TERM HUP
    
    # Critical operations that shouldn't be interrupted
    sleep 5
    echo "Critical operation complete"
    
    # Restore signal handling
    trap - INT TERM HUP
    echo "Signal handling restored"
}

# Example 8: Stack trace on error
print_stack_trace() {
    local frame=0
    echo "Stack trace:"
    
    while caller $frame; do
        ((frame++))
    done
}

trap print_stack_trace ERR

# Test the error handling
echo "Script starting..."
# Uncomment to test error handling:
# non_existent_command
echo "Script completed successfully"

Production-Grade Error Handler

#!/bin/bash
# production-error-handler.sh

set -Euo pipefail

# Configuration
readonly LOG_FILE="/var/log/myscript.log"
readonly MAX_LOG_SIZE=10485760  # 10MB
readonly ALERT_EMAIL="admin@example.com"

# Logging functions
log() {
    local level=$1
    local message=$2
    local timestamp=$(date '+%Y-%m-%d %H:%M:%S')
    
    echo "[$timestamp] [$level] $message" | tee -a "$LOG_FILE"
    
    # Rotate log if too large
    if [ -f "$LOG_FILE" ] && [ $(stat -c%s "$LOG_FILE") -gt $MAX_LOG_SIZE ]; then
        mv "$LOG_FILE" "$LOG_FILE.old"
        log "INFO" "Rotated log file"
    fi
}

# Error handler
error_handler() {
    local exit_code=$?
    local error_line=$1
    local error_command=$2
    
    # Get stack trace
    local stack_trace=""
    local frame=0
    while caller $frame; do
        ((frame++))
    done > /tmp/stacktrace.$$
    
    # Log error details
    log "ERROR" "Script failed with exit code: $exit_code"
    log "ERROR" "Line: $error_line, Command: $error_command"
    log "ERROR" "Stack trace:"
    cat /tmp/stacktrace.$$ | while read line; do
        log "ERROR" "  $line"
    done
    
    # Send alert for critical errors
    if [ $exit_code -ge 64 ]; then  # Custom critical errors
        send_alert "Script failed with critical error $exit_code"
    fi
    
    # Cleanup
    cleanup_resources
    
    # Remove temporary file
    rm -f /tmp/stacktrace.$$
    
    exit $exit_code
}

# Send alert (simplified)
send_alert() {
    local message=$1
    local subject="ALERT: Script Failure - $(hostname)"
    
    echo "Subject: $subject" > /tmp/alert.$$
    echo "" >> /tmp/alert.$$
    echo "$message" >> /tmp/alert.$$
    echo "" >> /tmp/alert.$$
    echo "Log snippet:" >> /tmp/alert.$$
    tail -20 "$LOG_FILE" >> /tmp/alert.$$
    
    # In production, would send email
    # mail -s "$subject" "$ALERT_EMAIL" < /tmp/alert.$$
    
    log "ALERT" "Sent alert: $message"
    rm -f /tmp/alert.$$
}

# Resource cleanup
cleanup_resources() {
    log "INFO" "Starting cleanup"
    
    # Kill child processes
    pkill -P $$ 2>/dev/null || true
    
    # Remove temporary files
    rm -f /tmp/myscript.* 2>/dev/null || true
    
    # Close network connections
    exec 3>&- 4>&- 5>&- 2>/dev/null || true
    
    # Release locks
    rm -f /var/lock/myscript.lock 2>/dev/null || true
    
    log "INFO" "Cleanup completed"
}

# Signal handler
signal_handler() {
    local signal=$1
    log "WARN" "Received signal $signal"
    cleanup_resources
    exit $((128 + signal))
}

# Setup traps
trap 'error_handler $LINENO "$BASH_COMMAND"' ERR
trap 'cleanup_resources' EXIT
trap 'signal_handler 1' HUP
trap 'signal_handler 2' INT
trap 'signal_handler 15' TERM

# Main script logic
main() {
    log "INFO" "Script started"
    
    # Create lock file
    if [ -f /var/lock/myscript.lock ]; then
        log "ERROR" "Script already running"
        exit 1
    fi
    touch /var/lock/myscript.lock
    
    # Business logic here
    log "INFO" "Processing started"
    
    # Simulate work
    sleep 2
    
    # Simulate error (uncomment to test)
    # non_existent_command
    
    log "INFO" "Processing completed"
    
    # Remove lock file
    rm -f /var/lock/myscript.lock
    
    log "INFO" "Script finished successfully"
}

# Run main function
main "$@"

Part 4: Debugging Techniques

Debugging with set -x and bash -x

$ bash -x script.sh

+ echo 'Starting script'

Starting script

+ count=1

+ '[' 1 -lt 5 ']'

+ echo 'Iteration 1'

Iteration 1

+ (( count++ ))

+ non_existent_command

script.sh: line 8: non_existent_command: command not found

Practical Debugging Examples

#!/bin/bash
# debugging-techniques.sh

# Technique 1: Verbose debugging
debug_echo() {
    if [[ ${DEBUG:-false} == true ]]; then
        echo "[DEBUG] $1" >&2
    fi
}

# Technique 2: Conditional set -x
if [[ ${TRACE:-false} == true ]]; then
    set -x
fi

# Technique 3: Function tracing
trace_function() {
    local func_name=$1
    debug_echo "Entering function: $func_name"
    
    # Execute the function
    "$@"
    
    local exit_code=$?
    debug_echo "Exiting function: $func_name with code: $exit_code"
    return $exit_code
}

# Technique 4: Variable inspection
inspect_variables() {
    echo "=== Variable Inspection ==="
    echo "Line number: $LINENO"
    echo "Function name: ${FUNCNAME[0]}"
    echo "Script name: $0"
    echo "PID: $$"
    echo "All arguments: $@"
    echo "Number of arguments: $#"
    
    # Show specific variables
    if [[ ${#VARIABLES_TO_INSPECT[@]} -gt 0 ]]; then
        for var in "${VARIABLES_TO_INSPECT[@]}"; do
            echo "$var=${!var:-not set}"
        done
    fi
}

# Technique 5: Timing debug
time_command() {
    local start_time
    start_time=$(date +%s%N)
    
    # Execute command
    "$@"
    
    local exit_code=$?
    local end_time=$(date +%s%N)
    local duration=$(( (end_time - start_time) / 1000000 ))  # milliseconds
    
    debug_echo "Command '$*' took ${duration}ms, exited with $exit_code"
    return $exit_code
}

# Technique 6: Debug log with rotation
setup_debug_log() {
    local log_file=${1:-"/tmp/debug.log"}
    local max_size=${2:-1048576}  # 1MB
    
    exec 3>> "$log_file"
    BASH_XTRACEFD=3
    
    # Rotate if too large
    if [ -f "$log_file" ] && [ $(stat -c%s "$log_file") -gt $max_size ]; then
        mv "$log_file" "$log_file.old"
    fi
    
    set -x
}

# Technique 7: Assertions
assert() {
    local condition=$1
    local message=${2:-"Assertion failed"}
    
    if ! eval "$condition"; then
        echo "ASSERTION FAILED: $message" >&2
        echo "Condition: $condition" >&2
        echo "Line: $LINENO" >&2
        return 1
    fi
}

# Technique 8: Step-by-step execution
debug_step() {
    local step=$1
    local command=$2
    
    echo ""
    echo "=== Step $step: $command ==="
    read -p "Press Enter to execute, Ctrl+C to abort: "
    
    eval "$command"
    local exit_code=$?
    
    echo "Exit code: $exit_code"
    read -p "Press Enter to continue: "
    
    return $exit_code
}

# Example usage
main() {
    local input_file=$1
    
    # Assert input file exists
    assert "[ -f '$input_file' ]" "Input file must exist"
    
    # Trace function execution
    trace_function process_file "$input_file"
    
    # Time a command
    time_command grep "pattern" "$input_file"
    
    # Debug step by step
    if [[ ${STEP_DEBUG:-false} == true ]]; then
        debug_step 1 "wc -l '$input_file'"
        debug_step 2 "head -5 '$input_file'"
    fi
    
    # Inspect variables
    VARIABLES_TO_INSPECT=(input_file)
    inspect_variables
}

process_file() {
    local file=$1
    debug_echo "Processing file: $file"
    
    # Business logic
    local line_count=$(wc -l < "$file")
    echo "File has $line_count lines"
    
    # Simulate error for testing
    if [[ $line_count -eq 0 ]]; then
        debug_echo "Empty file detected"
        return 1
    fi
    
    return 0
}

# Run with debugging enabled
export DEBUG=true
# export TRACE=true
# export STEP_DEBUG=true

main "$1"

Common Error Patterns and Solutions

Error Pattern Symptom Solution Example Fix Unbound variable script.sh: line X: VAR: unbound variable Use set -u and default values ${VAR:-default} Command not found command not found Check dependencies first command -v cmd || install_cmd Permission denied Permission denied Check permissions early [ -w "$file" ] || chmod +w "$file" No such file No such file or directory Validate file existence [ -f "$file" ] || exit 1 Syntax error syntax error near unexpected token Use shellcheck, test with bash -n bash -n script.sh Pipe failure Pipeline succeeds when it should fail Use set -o pipefail set -o pipefail Argument too long Argument list too long Use find -exec or xargs find . -name "*.txt" -exec rm {} +

Part 5: Production Error Handling Framework

Complete Error Handling Library

#!/bin/bash
# error-lib.sh - Production error handling library

set -Euo pipefail

# Color codes for output
readonly COLOR_RESET='\033[0m'
readonly COLOR_RED='\033[0;31m'
readonly COLOR_GREEN='\033[0;32m'
readonly COLOR_YELLOW='\033[1;33m'
readonly COLOR_BLUE='\033[0;34m'
readonly COLOR_CYAN='\033[0;36m'

# Log levels
readonly LOG_LEVEL_DEBUG=0
readonly LOG_LEVEL_INFO=1
readonly LOG_LEVEL_WARN=2
readonly LOG_LEVEL_ERROR=3
readonly LOG_LEVEL_CRITICAL=4

# Configuration
LOG_LEVEL=${LOG_LEVEL:-$LOG_LEVEL_INFO}
LOG_FILE=${LOG_FILE:-""}
ENABLE_COLORS=${ENABLE_COLORS:-true}

# Custom exit codes
declare -A EXIT_CODES=(
    [SUCCESS]=0
    [GENERAL_ERROR]=1
    [MISSING_DEPENDENCY]=2
    [INVALID_ARGUMENT]=3
    [FILE_NOT_FOUND]=4
    [PERMISSION_DENIED]=5
    [CONFIG_ERROR]=6
    [NETWORK_ERROR]=7
    [TIMEOUT]=8
    [RESOURCE_EXHAUSTED]=9
    [VALIDATION_ERROR]=10
    [BUSINESS_LOGIC_ERROR]=11
)

# Logging function
log() {
    local level=$1
    local message=$2
    local timestamp
    timestamp=$(date '+%Y-%m-%d %H:%M:%S')
    
    # Check log level
    if [ $level -lt $LOG_LEVEL ]; then
        return 0
    fi
    
    # Determine level name and color
    local level_name level_color
    case $level in
        $LOG_LEVEL_DEBUG)
            level_name="DEBUG"
            level_color=$COLOR_CYAN
            ;;
        $LOG_LEVEL_INFO)
            level_name="INFO"
            level_color=$COLOR_GREEN
            ;;
        $LOG_LEVEL_WARN)
            level_name="WARN"
            level_color=$COLOR_YELLOW
            ;;
        $LOG_LEVEL_ERROR)
            level_name="ERROR"
            level_color=$COLOR_RED
            ;;
        $LOG_LEVEL_CRITICAL)
            level_name="CRITICAL"
            level_color=$COLOR_RED
            ;;
        *)
            level_name="UNKNOWN"
            level_color=$COLOR_RESET
            ;;
    esac
    
    # Format message
    local formatted_message="[$timestamp] [$level_name] $message"
    
    # Output to console with colors if enabled
    if [[ $ENABLE_COLORS == true ]] && [ -t 1 ]; then
        echo -e "${level_color}${formatted_message}${COLOR_RESET}"
    else
        echo "$formatted_message"
    fi
    
    # Output to log file if specified
    if [[ -n $LOG_FILE ]]; then
        echo "$formatted_message" >> "$LOG_FILE"
    fi
}

# Error handler
setup_error_handler() {
    local error_handler_function=$1
    
    trap 'handle_error $LINENO "$BASH_COMMAND" "$?"' ERR
    trap 'cleanup_on_exit' EXIT
    trap 'handle_signal 1' HUP
    trap 'handle_signal 2' INT
    trap 'handle_signal 15' TERM
    
    # Store custom error handler
    if declare -f "$error_handler_function" > /dev/null; then
        CUSTOM_ERROR_HANDLER=$error_handler_function
    fi
}

handle_error() {
    local line_number=$1
    local command=$2
    local exit_code=$3
    
    # Get stack trace
    local stack_trace=""
    local frame=0
    while caller $frame; do
        ((frame++))
    done > /tmp/stacktrace.$$
    
    # Log error details
    log $LOG_LEVEL_ERROR "Script failed with exit code: $exit_code"
    log $LOG_LEVEL_ERROR "Line: $line_number, Command: $command"
    log $LOG_LEVEL_ERROR "Stack trace:"
    while read -r trace_line; do
        log $LOG_LEVEL_ERROR "  $trace_line"
    done < /tmp/stacktrace.$$
    
    # Call custom error handler if defined
    if [[ -n $CUSTOM_ERROR_HANDLER ]] && declare -f "$CUSTOM_ERROR_HANDLER" > /dev/null; then
        $CUSTOM_ERROR_HANDLER "$exit_code" "$line_number" "$command"
    fi
    
    # Cleanup
    rm -f /tmp/stacktrace.$$
    
    exit $exit_code
}

handle_signal() {
    local signal=$1
    log $LOG_LEVEL_WARN "Received signal $signal"
    cleanup_on_exit
    exit $((128 + signal))
}

cleanup_on_exit() {
    # Remove temporary files
    rm -f /tmp/$$.* 2>/dev/null || true
    
    # Release locks
    if [[ -n $SCRIPT_LOCK_FILE ]] && [ -f "$SCRIPT_LOCK_FILE" ]; then
        rm -f "$SCRIPT_LOCK_FILE"
    fi
    
    log $LOG_LEVEL_DEBUG "Cleanup completed"
}

# Validation functions
validate_file() {
    local file=$1
    local check_readable=${2:-false}
    local check_writable=${3:-false}
    
    if [[ ! -e $file ]]; then
        log $LOG_LEVEL_ERROR "File does not exist: $file"
        return ${EXIT_CODES[FILE_NOT_FOUND]}
    fi
    
    if [[ ! -f $file ]]; then
        log $LOG_LEVEL_ERROR "Not a regular file: $file"
        return ${EXIT_CODES[VALIDATION_ERROR]}
    fi
    
    if [[ $check_readable == true ]] && [[ ! -r $file ]]; then
        log $LOG_LEVEL_ERROR "File not readable: $file"
        return ${EXIT_CODES[PERMISSION_DENIED]}
    fi
    
    if [[ $check_writable == true ]] && [[ ! -w $file ]]; then
        log $LOG_LEVEL_ERROR "File not writable: $file"
        return ${EXIT_CODES[PERMISSION_DENIED]}
    fi
    
    return ${EXIT_CODES[SUCCESS]}
}

validate_directory() {
    local dir=$1
    local check_writable=${2:-false}
    
    if [[ ! -d $dir ]]; then
        log $LOG_LEVEL_ERROR "Not a directory: $dir"
        return ${EXIT_CODES[VALIDATION_ERROR]}
    fi
    
    if [[ $check_writable == true ]] && [[ ! -w $dir ]]; then
        log $LOG_LEVEL_ERROR "Directory not writable: $dir"
        return ${EXIT_CODES[PERMISSION_DENIED]}
    fi
    
    return ${EXIT_CODES[SUCCESS]}
}

validate_command() {
    local command=$1
    
    if ! command -v "$command" &> /dev/null; then
        log $LOG_LEVEL_ERROR "Command not found: $command"
        return ${EXIT_CODES[MISSING_DEPENDENCY]}
    fi
    
    return ${EXIT_CODES[SUCCESS]}
}

# Safe execution functions
safe_exec() {
    local command=$1
    shift
    local args=("$@")
    
    log $LOG_LEVEL_DEBUG "Executing: $command ${args[*]}"
    
    if "$command" "${args[@]}"; then
        return ${EXIT_CODES[SUCCESS]}
    else
        local exit_code=$?
        log $LOG_LEVEL_ERROR "Command failed: $command (exit: $exit_code)"
        return $exit_code
    fi
}

with_retry() {
    local max_attempts=$1
    local delay=$2
    local command=$3
    shift 3
    local args=("$@")
    
    local attempt=1
    
    while [ $attempt -le $max_attempts ]; do
        log $LOG_LEVEL_INFO "Attempt $attempt/$max_attempts: $command"
        
        if "$command" "${args[@]}"; then
            log $LOG_LEVEL_INFO "Command succeeded on attempt $attempt"
            return ${EXIT_CODES[SUCCESS]}
        fi
        
        local exit_code=$?
        log $LOG_LEVEL_WARN "Attempt $attempt failed with code $exit_code"
        
        if [ $attempt -lt $max_attempts ]; then
            log $LOG_LEVEL_INFO "Retrying in ${delay}s..."
            sleep $delay
        fi
        
        ((attempt++))
    done
    
    log $LOG_LEVEL_ERROR "Command failed after $max_attempts attempts"
    return ${EXIT_CODES[TIMEOUT]}
}

with_timeout() {
    local timeout=$1
    local command=$2
    shift 2
    local args=("$@")
    
    log $LOG_LEVEL_DEBUG "Executing with timeout ${timeout}s: $command"
    
    # Start command in background
    "$command" "${args[@]}" &
    local pid=$!
    
    # Start timeout killer
    (
        sleep $timeout
        if kill -0 $pid 2>/dev/null; then
            log $LOG_LEVEL_WARN "Command timed out, killing PID: $pid"
            kill -TERM $pid 2>/dev/null
            sleep 1
            kill -KILL $pid 2>/dev/null
        fi
    ) &
    local timeout_pid=$!
    
    # Wait for command to complete
    wait $pid 2>/dev/null
    local exit_code=$?
    
    # Kill timeout killer
    kill $timeout_pid 2>/dev/null
    
    if [ $exit_code -eq 0 ]; then
        log $LOG_LEVEL_DEBUG "Command completed within timeout"
    else
        log $LOG_LEVEL_ERROR "Command failed or timed out (exit: $exit_code)"
    fi
    
    return $exit_code
}

# Export functions
export -f log validate_file validate_directory validate_command safe_exec with_retry with_timeout

Best Practices Checklist

Always start scripts with set -euo pipefail

Use meaningful custom exit codes (1-125)

Implement cleanup with trap cleanup EXIT

Handle Ctrl+C with trap handler INT

Validate all inputs before processing

Log errors with timestamps and context

Use $? immediately after commands

Test error paths as thoroughly as success paths

Use shellcheck to catch common issues

Implement retry logic for transient failures

Add timeout for long-running operations

Create lock files for mutual exclusion

Clean up temporary files on exit

Document error codes and their meanings

Test scripts with invalid inputs

Critical Errors to Always Handle:
1. Missing dependencies (check with command -v)
2. Permission denied (validate before operations)
3. Disk full (check available space)
4. Network failures (timeout and retry)
5. Invalid user input (validate and sanitize)
6. Concurrent execution (use lock files)
7. Resource exhaustion (monitor memory/CPU)
8. Signal interrupts (handle Ctrl+C gracefully)

Debugging Workflow:
1. Reproduce error: Run script with same inputs
2. Enable tracing: bash -x script.sh
3. Check exit codes: echo $? after each command
4. Validate inputs: Add set -u to catch unset variables
5. Isolate problem: Comment out sections until error disappears
6. Add logging: Insert debug messages before suspicious code
7. Test fixes: Make minimal changes and retest
8. Prevent regression: Add test cases for the error

Robust Scripts for Production

Error handling is what separates amateur scripts from production-ready tools. A script that works perfectly when everything goes right is useless if it crashes and leaves a mess when something goes wrong.

Remember: Good error handling doesn't prevent errors - it anticipates them. It ensures your scripts fail gracefully, clean up after themselves, and provide enough information to fix the problem.

Practice Exercise: Take a simple script you've written and add comprehensive error handling. Add input validation, cleanup traps, meaningful exit codes, and logging. Then test it by deliberately causing errors (remove files, revoke permissions, etc.).

Quick Reference:
• Exit on error: set -e
• Check exit code: echo $?
• Custom exit: exit 1
• Trap cleanup: trap cleanup EXIT
• Trap error: trap handler ERR
• Trap Ctrl+C: trap handler INT
• Debug mode: bash -x script.sh
• Syntax check: bash -n script.sh
• Validate variable: ${VAR:-default}
• Check command: command -v cmd

Error Handling in Bash Scripts for DevOps