Transform fragile Bash scripts into robust production tools with professional error handling. Learn how to detect, handle, and recover from errors in your DevOps automation scripts.
E
Exit Codes
Understand command success/failure
$? variable, true/false commands
Custom exit codes (0-255)
command; echo $?; exit 1
T
Trap Commands
Handle signals and errors
Cleanup on exit, interrupt handling
ERR, EXIT, DEBUG traps
trap cleanup EXIT
D
Debugging
Find and fix errors
set -x, debug logging, stack traces
Validation and testing
set -x; bash -x script.sh
Error Occurs
→
Detect Error
→
Handle Error
→
Cleanup
→
Exit Gracefully
Part 1: Understanding Exit Codes
What Are Exit Codes?
Every command in Linux returns an exit code when it finishes:
Exit Code
Meaning
Example Command
Typical Use
0
Success
ls /tmp
Normal successful completion
1
General error
ls /nonexistent
Catch-all for failures
2
Misuse of shell builtins
empty_function() {}
Shell syntax errors
126
Command cannot execute
/dev/null as command
Permission denied
127
Command not found
nonexistent_command
Typo or missing program
130
Script terminated by Ctrl+C
Press Ctrl+C during script
SIGINT (2) + 128 = 130
255*
Exit status out of range
exit 300
Wraps around (300-256=44)
Pro Tip: Exit codes 0-255 are valid. Codes 1-125 are for user-defined errors. Codes 126+ are reserved for system use. Use codes 1-125 for your custom errors with specific meanings.
Checking Exit Codes
#!/bin/bash
# Method 1: Check $? immediately
ls /tmp
echo "Exit code: $?" # Should be 0
ls /nonexistent
echo "Exit code: $?" # Should be 1
# Method 2: Using if statement directly
if ls /tmp; then
echo "Command succeeded"
else
echo "Command failed with code: $?"
fi
# Method 3: Store and check later
ls /tmp
LS_EXIT=$?
if [ $LS_EXIT -eq 0 ]; then
echo "ls succeeded"
elif [ $LS_EXIT -eq 1 ]; then
echo "ls failed - file not found"
elif [ $LS_EXIT -eq 2 ]; then
echo "ls failed - syntax error"
else
echo "ls failed with unknown code: $LS_EXIT"
fi
# Method 4: Test commands without running
if command -v git &> /dev/null; then
echo "git is installed"
else
echo "git is not installed"
exit 1
fi
# Method 5: Check multiple commands
mkdir -p /tmp/test && cd /tmp/test && echo "Success"
# If any command fails, stops due to &&
# Method 6: Continue despite errors
mkdir -p /tmp/test || echo "mkdir failed, continuing..."
cd /tmp/test || exit 1
# Method 7: Complex logic
create_backup() {
local src=$1
local dest=$2
if cp -r "$src" "$dest"; then
echo "Backup successful"
return 0
else
local exit_code=$?
echo "Backup failed with code: $exit_code"
case $exit_code in
1) echo "File not found" ;;
2) echo "Permission denied" ;;
13) echo "Not a directory" ;;
*) echo "Unknown error" ;;
esac
return $exit_code
fi
}
Custom Exit Codes
#!/bin/bash
# Define meaningful exit codes
readonly EXIT_SUCCESS=0
readonly EXIT_GENERAL_ERROR=1
readonly EXIT_MISSING_DEPENDENCY=2
readonly EXIT_INVALID_ARGUMENT=3
readonly EXIT_FILE_NOT_FOUND=4
readonly EXIT_PERMISSION_DENIED=5
readonly EXIT_CONFIG_ERROR=6
readonly EXIT_NETWORK_ERROR=7
readonly EXIT_TIMEOUT=8
readonly EXIT_RESOURCE_EXHAUSTED=9
# Use custom exit codes
function check_dependencies() {
local missing_deps=()
for dep in "$@"; do
if ! command -v "$dep" &> /dev/null; then
missing_deps+=("$dep")
fi
done
if [ ${#missing_deps[@]} -gt 0 ]; then
echo "ERROR: Missing dependencies: ${missing_deps[*]}"
return $EXIT_MISSING_DEPENDENCY
fi
return $EXIT_SUCCESS
}
function validate_arguments() {
if [ $# -lt 2 ]; then
echo "ERROR: Usage: $0 "
return $EXIT_INVALID_ARGUMENT
fi
local source=$1
local destination=$2
if [ ! -e "$source" ]; then
echo "ERROR: Source not found: $source"
return $EXIT_FILE_NOT_FOUND
fi
if [ ! -w "$(dirname "$destination")" ]; then
echo "ERROR: Cannot write to destination directory"
return $EXIT_PERMISSION_DENIED
fi
return $EXIT_SUCCESS
}
# Main script with proper error handling
main() {
local source_dir=$1
local backup_dir=$2
# Check dependencies
check_dependencies rsync tar gzip || {
local exit_code=$?
echo "Dependency check failed"
exit $exit_code
}
# Validate arguments
validate_arguments "$source_dir" "$backup_dir" || {
local exit_code=$?
echo "Argument validation failed"
exit $exit_code
}
# Perform backup
if ! rsync -av "$source_dir/" "$backup_dir/"; then
echo "ERROR: Backup failed"
exit $EXIT_GENERAL_ERROR
fi
echo "Backup completed successfully"
exit $EXIT_SUCCESS
}
# Call main with arguments
main "$@"
Part 2: Error Handling with set Options
The Essential set Options
Option
What it does
When to use
Example
set -e
Exit immediately on error
Production scripts
#!/bin/bash set -e
set -u
Treat unset variables as error
All scripts (catch typos)
set -u echo $UNSET_VAR # Error
set -o pipefail
Pipeline fails if any command fails
Pipelines with multiple commands
cmd1 | cmd2 | cmd3
set -x
Print commands before executing
Debugging only
set -x echo "Debug" set +x
set -E
ERR trap inherited in functions
Complex scripts with functions
set -E trap handler ERR
Practical Usage of set Options
#!/bin/bash
# Best practice: Start all production scripts with these
set -euo pipefail
# Example 1: set -e (exit on error)
echo "Starting script with set -e"
# This will cause script to exit
# non_existent_command
# But we can handle expected errors
if ! non_existent_command 2>/dev/null; then
echo "Command failed as expected, continuing..."
fi
# Example 2: set -u (unset variables error)
echo "Testing set -u"
defined_var="I exist"
echo "Defined: $defined_var"
# This would cause error with set -u
# echo "Undefined: $undefined_var"
# Safe way to handle possibly undefined variables
echo "Safe check: ${undefined_var:-not set}"
# Example 3: set -o pipefail
echo "Testing pipefail"
# Without pipefail: returns 0 (success of last command)
false | true
echo "Without pipefail: $?" # 0
# With pipefail: returns 1 (failure of false)
set -o pipefail
false | true
echo "With pipefail: $?" # 1
# Example 4: Combining all options
strict_mode() {
set -euo pipefail
echo "In strict mode"
# All errors are caught
local required_var=${1:?"Missing required argument"}
echo "Required: $required_var"
# Pipe failures are caught
echo "test" | grep "pattern" | wc -l
echo "Strict mode completed"
}
# Example 5: Temporarily disabling set -e
echo "Temporarily disabling set -e"
set +e # Disable exit on error
non_existent_command
echo "Command failed but script continues: $?"
set -e # Re-enable exit on error
# Example 6: Handling commands that might fail
backup_database() {
local db_name=$1
# Use command substitution with set +e
set +e
local output=$(pg_dump "$db_name" 2>&1)
local exit_code=$?
set -e
if [ $exit_code -ne 0 ]; then
echo "ERROR: Database backup failed: $output"
return $exit_code
fi
echo "$output" > "/backups/$db_name.sql"
echo "Backup successful"
}
# Example 7: Using set -E with traps
cleanup() {
echo "Cleaning up..."
rm -f /tmp/tempfile.$$
}
error_handler() {
local exit_code=$?
echo "ERROR: Script failed with code $exit_code"
echo "Line: $LINENO, Command: $BASH_COMMAND"
cleanup
exit $exit_code
}
set -E # ERR trap is inherited
trap error_handler ERR
trap cleanup EXIT
# Now errors are caught by error_handler
# non_existent_command
Important: Be careful with set -e in complex scripts. Some commands are expected to fail (like grep returning 1 for no match). Use conditional execution or temporarily disable set -e for expected failures.
Part 3: Trap Command for Signal Handling
Common Linux Signals
1
SIGHUP
Hangup (terminal closed)
2
SIGINT
Interrupt (Ctrl+C)
3
SIGQUIT
Quit (Ctrl+\)
9
SIGKILL
Kill (cannot be caught)
15
SIGTERM
Terminate (default kill)
0
EXIT
Script exit (not a real signal)
ERR
ERR
Command fails (Bash specific)
DEBUG
DEBUG
Before each command
Using trap for Clean Error Handling
#!/bin/bash
# trap-examples.sh
set -euo pipefail
# Example 1: Basic cleanup on exit
cleanup() {
echo "Cleaning up temporary files..."
rm -f /tmp/myscript.*
echo "Cleanup complete"
}
trap cleanup EXIT
# Create temporary file
TEMP_FILE=$(mktemp /tmp/myscript.XXXXXX)
echo "Created: $TEMP_FILE"
# Example 2: Handle Ctrl+C (SIGINT)
interrupt_handler() {
echo ""
echo "Script interrupted by user (Ctrl+C)"
cleanup
exit 130 # 128 + SIGINT(2)
}
trap interrupt_handler INT
# Example 3: Handle any error
error_handler() {
local exit_code=$?
echo "ERROR: Script failed at line $LINENO"
echo "Command: $BASH_COMMAND"
echo "Exit code: $exit_code"
# Send alert (simulated)
echo "ALERT: Script failed with code $exit_code" >&2
cleanup
exit $exit_code
}
trap error_handler ERR
# Example 4: Multiple signals
handle_termination() {
local signal=$1
echo "Received signal $signal"
cleanup
exit $((128 + signal))
}
trap 'handle_termination 1' HUP # SIGHUP
trap 'handle_termination 2' INT # SIGINT
trap 'handle_termination 15' TERM # SIGTERM
# Example 5: DEBUG trap for logging
debug_handler() {
# Don't trace trap commands to avoid infinite loop
if [[ $BASH_COMMAND != "trap debug_handler DEBUG" ]]; then
echo "[DEBUG] Line $LINENO: $BASH_COMMAND" >&2
fi
}
# Enable debug tracing only if DEBUG env var is set
if [[ ${DEBUG:-false} == true ]]; then
trap debug_handler DEBUG
fi
# Example 6: Function-specific traps
process_data() {
local input_file=$1
local output_file=$2
# Local cleanup for this function
local local_cleanup() {
echo "Cleaning function resources..."
rm -f "$output_file.tmp"
}
# Trap EXIT within function scope
trap local_cleanup EXIT
# Process data
cp "$input_file" "$output_file.tmp"
grep "important" "$output_file.tmp" > "$output_file"
# Remove local trap
trap - EXIT
local_cleanup
}
# Example 7: Ignore signals temporarily
ignore_interrupts() {
echo "Starting critical section (ignoring interrupts)"
# Ignore interrupts
trap '' INT TERM HUP
# Critical operations that shouldn't be interrupted
sleep 5
echo "Critical operation complete"
# Restore signal handling
trap - INT TERM HUP
echo "Signal handling restored"
}
# Example 8: Stack trace on error
print_stack_trace() {
local frame=0
echo "Stack trace:"
while caller $frame; do
((frame++))
done
}
trap print_stack_trace ERR
# Test the error handling
echo "Script starting..."
# Uncomment to test error handling:
# non_existent_command
echo "Script completed successfully"
Production-Grade Error Handler
#!/bin/bash
# production-error-handler.sh
set -Euo pipefail
# Configuration
readonly LOG_FILE="/var/log/myscript.log"
readonly MAX_LOG_SIZE=10485760 # 10MB
readonly ALERT_EMAIL="admin@example.com"
# Logging functions
log() {
local level=$1
local message=$2
local timestamp=$(date '+%Y-%m-%d %H:%M:%S')
echo "[$timestamp] [$level] $message" | tee -a "$LOG_FILE"
# Rotate log if too large
if [ -f "$LOG_FILE" ] && [ $(stat -c%s "$LOG_FILE") -gt $MAX_LOG_SIZE ]; then
mv "$LOG_FILE" "$LOG_FILE.old"
log "INFO" "Rotated log file"
fi
}
# Error handler
error_handler() {
local exit_code=$?
local error_line=$1
local error_command=$2
# Get stack trace
local stack_trace=""
local frame=0
while caller $frame; do
((frame++))
done > /tmp/stacktrace.$$
# Log error details
log "ERROR" "Script failed with exit code: $exit_code"
log "ERROR" "Line: $error_line, Command: $error_command"
log "ERROR" "Stack trace:"
cat /tmp/stacktrace.$$ | while read line; do
log "ERROR" " $line"
done
# Send alert for critical errors
if [ $exit_code -ge 64 ]; then # Custom critical errors
send_alert "Script failed with critical error $exit_code"
fi
# Cleanup
cleanup_resources
# Remove temporary file
rm -f /tmp/stacktrace.$$
exit $exit_code
}
# Send alert (simplified)
send_alert() {
local message=$1
local subject="ALERT: Script Failure - $(hostname)"
echo "Subject: $subject" > /tmp/alert.$$
echo "" >> /tmp/alert.$$
echo "$message" >> /tmp/alert.$$
echo "" >> /tmp/alert.$$
echo "Log snippet:" >> /tmp/alert.$$
tail -20 "$LOG_FILE" >> /tmp/alert.$$
# In production, would send email
# mail -s "$subject" "$ALERT_EMAIL" < /tmp/alert.$$
log "ALERT" "Sent alert: $message"
rm -f /tmp/alert.$$
}
# Resource cleanup
cleanup_resources() {
log "INFO" "Starting cleanup"
# Kill child processes
pkill -P $$ 2>/dev/null || true
# Remove temporary files
rm -f /tmp/myscript.* 2>/dev/null || true
# Close network connections
exec 3>&- 4>&- 5>&- 2>/dev/null || true
# Release locks
rm -f /var/lock/myscript.lock 2>/dev/null || true
log "INFO" "Cleanup completed"
}
# Signal handler
signal_handler() {
local signal=$1
log "WARN" "Received signal $signal"
cleanup_resources
exit $((128 + signal))
}
# Setup traps
trap 'error_handler $LINENO "$BASH_COMMAND"' ERR
trap 'cleanup_resources' EXIT
trap 'signal_handler 1' HUP
trap 'signal_handler 2' INT
trap 'signal_handler 15' TERM
# Main script logic
main() {
log "INFO" "Script started"
# Create lock file
if [ -f /var/lock/myscript.lock ]; then
log "ERROR" "Script already running"
exit 1
fi
touch /var/lock/myscript.lock
# Business logic here
log "INFO" "Processing started"
# Simulate work
sleep 2
# Simulate error (uncomment to test)
# non_existent_command
log "INFO" "Processing completed"
# Remove lock file
rm -f /var/lock/myscript.lock
log "INFO" "Script finished successfully"
}
# Run main function
main "$@"
Part 4: Debugging Techniques
Debugging with set -x and bash -x
$ bash -x script.sh
+ echo 'Starting script'
Starting script
+ count=1
+ '[' 1 -lt 5 ']'
+ echo 'Iteration 1'
Iteration 1
+ (( count++ ))
+ non_existent_command
script.sh: line 8: non_existent_command: command not found
Practical Debugging Examples
#!/bin/bash
# debugging-techniques.sh
# Technique 1: Verbose debugging
debug_echo() {
if [[ ${DEBUG:-false} == true ]]; then
echo "[DEBUG] $1" >&2
fi
}
# Technique 2: Conditional set -x
if [[ ${TRACE:-false} == true ]]; then
set -x
fi
# Technique 3: Function tracing
trace_function() {
local func_name=$1
debug_echo "Entering function: $func_name"
# Execute the function
"$@"
local exit_code=$?
debug_echo "Exiting function: $func_name with code: $exit_code"
return $exit_code
}
# Technique 4: Variable inspection
inspect_variables() {
echo "=== Variable Inspection ==="
echo "Line number: $LINENO"
echo "Function name: ${FUNCNAME[0]}"
echo "Script name: $0"
echo "PID: $$"
echo "All arguments: $@"
echo "Number of arguments: $#"
# Show specific variables
if [[ ${#VARIABLES_TO_INSPECT[@]} -gt 0 ]]; then
for var in "${VARIABLES_TO_INSPECT[@]}"; do
echo "$var=${!var:-not set}"
done
fi
}
# Technique 5: Timing debug
time_command() {
local start_time
start_time=$(date +%s%N)
# Execute command
"$@"
local exit_code=$?
local end_time=$(date +%s%N)
local duration=$(( (end_time - start_time) / 1000000 )) # milliseconds
debug_echo "Command '$*' took ${duration}ms, exited with $exit_code"
return $exit_code
}
# Technique 6: Debug log with rotation
setup_debug_log() {
local log_file=${1:-"/tmp/debug.log"}
local max_size=${2:-1048576} # 1MB
exec 3>> "$log_file"
BASH_XTRACEFD=3
# Rotate if too large
if [ -f "$log_file" ] && [ $(stat -c%s "$log_file") -gt $max_size ]; then
mv "$log_file" "$log_file.old"
fi
set -x
}
# Technique 7: Assertions
assert() {
local condition=$1
local message=${2:-"Assertion failed"}
if ! eval "$condition"; then
echo "ASSERTION FAILED: $message" >&2
echo "Condition: $condition" >&2
echo "Line: $LINENO" >&2
return 1
fi
}
# Technique 8: Step-by-step execution
debug_step() {
local step=$1
local command=$2
echo ""
echo "=== Step $step: $command ==="
read -p "Press Enter to execute, Ctrl+C to abort: "
eval "$command"
local exit_code=$?
echo "Exit code: $exit_code"
read -p "Press Enter to continue: "
return $exit_code
}
# Example usage
main() {
local input_file=$1
# Assert input file exists
assert "[ -f '$input_file' ]" "Input file must exist"
# Trace function execution
trace_function process_file "$input_file"
# Time a command
time_command grep "pattern" "$input_file"
# Debug step by step
if [[ ${STEP_DEBUG:-false} == true ]]; then
debug_step 1 "wc -l '$input_file'"
debug_step 2 "head -5 '$input_file'"
fi
# Inspect variables
VARIABLES_TO_INSPECT=(input_file)
inspect_variables
}
process_file() {
local file=$1
debug_echo "Processing file: $file"
# Business logic
local line_count=$(wc -l < "$file")
echo "File has $line_count lines"
# Simulate error for testing
if [[ $line_count -eq 0 ]]; then
debug_echo "Empty file detected"
return 1
fi
return 0
}
# Run with debugging enabled
export DEBUG=true
# export TRACE=true
# export STEP_DEBUG=true
main "$1"
Common Error Patterns and Solutions
Error Pattern
Symptom
Solution
Example Fix
Unbound variable
script.sh: line X: VAR: unbound variable
Use set -u and default values
${VAR:-default}
Command not found
command not found
Check dependencies first
command -v cmd || install_cmd
Permission denied
Permission denied
Check permissions early
[ -w "$file" ] || chmod +w "$file"
No such file
No such file or directory
Validate file existence
[ -f "$file" ] || exit 1
Syntax error
syntax error near unexpected token
Use shellcheck, test with bash -n
bash -n script.sh
Pipe failure
Pipeline succeeds when it should fail
Use set -o pipefail
set -o pipefail
Argument too long
Argument list too long
Use find -exec or xargs
find . -name "*.txt" -exec rm {} +
Part 5: Production Error Handling Framework
Complete Error Handling Library
#!/bin/bash
# error-lib.sh - Production error handling library
set -Euo pipefail
# Color codes for output
readonly COLOR_RESET='\033[0m'
readonly COLOR_RED='\033[0;31m'
readonly COLOR_GREEN='\033[0;32m'
readonly COLOR_YELLOW='\033[1;33m'
readonly COLOR_BLUE='\033[0;34m'
readonly COLOR_CYAN='\033[0;36m'
# Log levels
readonly LOG_LEVEL_DEBUG=0
readonly LOG_LEVEL_INFO=1
readonly LOG_LEVEL_WARN=2
readonly LOG_LEVEL_ERROR=3
readonly LOG_LEVEL_CRITICAL=4
# Configuration
LOG_LEVEL=${LOG_LEVEL:-$LOG_LEVEL_INFO}
LOG_FILE=${LOG_FILE:-""}
ENABLE_COLORS=${ENABLE_COLORS:-true}
# Custom exit codes
declare -A EXIT_CODES=(
[SUCCESS]=0
[GENERAL_ERROR]=1
[MISSING_DEPENDENCY]=2
[INVALID_ARGUMENT]=3
[FILE_NOT_FOUND]=4
[PERMISSION_DENIED]=5
[CONFIG_ERROR]=6
[NETWORK_ERROR]=7
[TIMEOUT]=8
[RESOURCE_EXHAUSTED]=9
[VALIDATION_ERROR]=10
[BUSINESS_LOGIC_ERROR]=11
)
# Logging function
log() {
local level=$1
local message=$2
local timestamp
timestamp=$(date '+%Y-%m-%d %H:%M:%S')
# Check log level
if [ $level -lt $LOG_LEVEL ]; then
return 0
fi
# Determine level name and color
local level_name level_color
case $level in
$LOG_LEVEL_DEBUG)
level_name="DEBUG"
level_color=$COLOR_CYAN
;;
$LOG_LEVEL_INFO)
level_name="INFO"
level_color=$COLOR_GREEN
;;
$LOG_LEVEL_WARN)
level_name="WARN"
level_color=$COLOR_YELLOW
;;
$LOG_LEVEL_ERROR)
level_name="ERROR"
level_color=$COLOR_RED
;;
$LOG_LEVEL_CRITICAL)
level_name="CRITICAL"
level_color=$COLOR_RED
;;
*)
level_name="UNKNOWN"
level_color=$COLOR_RESET
;;
esac
# Format message
local formatted_message="[$timestamp] [$level_name] $message"
# Output to console with colors if enabled
if [[ $ENABLE_COLORS == true ]] && [ -t 1 ]; then
echo -e "${level_color}${formatted_message}${COLOR_RESET}"
else
echo "$formatted_message"
fi
# Output to log file if specified
if [[ -n $LOG_FILE ]]; then
echo "$formatted_message" >> "$LOG_FILE"
fi
}
# Error handler
setup_error_handler() {
local error_handler_function=$1
trap 'handle_error $LINENO "$BASH_COMMAND" "$?"' ERR
trap 'cleanup_on_exit' EXIT
trap 'handle_signal 1' HUP
trap 'handle_signal 2' INT
trap 'handle_signal 15' TERM
# Store custom error handler
if declare -f "$error_handler_function" > /dev/null; then
CUSTOM_ERROR_HANDLER=$error_handler_function
fi
}
handle_error() {
local line_number=$1
local command=$2
local exit_code=$3
# Get stack trace
local stack_trace=""
local frame=0
while caller $frame; do
((frame++))
done > /tmp/stacktrace.$$
# Log error details
log $LOG_LEVEL_ERROR "Script failed with exit code: $exit_code"
log $LOG_LEVEL_ERROR "Line: $line_number, Command: $command"
log $LOG_LEVEL_ERROR "Stack trace:"
while read -r trace_line; do
log $LOG_LEVEL_ERROR " $trace_line"
done < /tmp/stacktrace.$$
# Call custom error handler if defined
if [[ -n $CUSTOM_ERROR_HANDLER ]] && declare -f "$CUSTOM_ERROR_HANDLER" > /dev/null; then
$CUSTOM_ERROR_HANDLER "$exit_code" "$line_number" "$command"
fi
# Cleanup
rm -f /tmp/stacktrace.$$
exit $exit_code
}
handle_signal() {
local signal=$1
log $LOG_LEVEL_WARN "Received signal $signal"
cleanup_on_exit
exit $((128 + signal))
}
cleanup_on_exit() {
# Remove temporary files
rm -f /tmp/$$.* 2>/dev/null || true
# Release locks
if [[ -n $SCRIPT_LOCK_FILE ]] && [ -f "$SCRIPT_LOCK_FILE" ]; then
rm -f "$SCRIPT_LOCK_FILE"
fi
log $LOG_LEVEL_DEBUG "Cleanup completed"
}
# Validation functions
validate_file() {
local file=$1
local check_readable=${2:-false}
local check_writable=${3:-false}
if [[ ! -e $file ]]; then
log $LOG_LEVEL_ERROR "File does not exist: $file"
return ${EXIT_CODES[FILE_NOT_FOUND]}
fi
if [[ ! -f $file ]]; then
log $LOG_LEVEL_ERROR "Not a regular file: $file"
return ${EXIT_CODES[VALIDATION_ERROR]}
fi
if [[ $check_readable == true ]] && [[ ! -r $file ]]; then
log $LOG_LEVEL_ERROR "File not readable: $file"
return ${EXIT_CODES[PERMISSION_DENIED]}
fi
if [[ $check_writable == true ]] && [[ ! -w $file ]]; then
log $LOG_LEVEL_ERROR "File not writable: $file"
return ${EXIT_CODES[PERMISSION_DENIED]}
fi
return ${EXIT_CODES[SUCCESS]}
}
validate_directory() {
local dir=$1
local check_writable=${2:-false}
if [[ ! -d $dir ]]; then
log $LOG_LEVEL_ERROR "Not a directory: $dir"
return ${EXIT_CODES[VALIDATION_ERROR]}
fi
if [[ $check_writable == true ]] && [[ ! -w $dir ]]; then
log $LOG_LEVEL_ERROR "Directory not writable: $dir"
return ${EXIT_CODES[PERMISSION_DENIED]}
fi
return ${EXIT_CODES[SUCCESS]}
}
validate_command() {
local command=$1
if ! command -v "$command" &> /dev/null; then
log $LOG_LEVEL_ERROR "Command not found: $command"
return ${EXIT_CODES[MISSING_DEPENDENCY]}
fi
return ${EXIT_CODES[SUCCESS]}
}
# Safe execution functions
safe_exec() {
local command=$1
shift
local args=("$@")
log $LOG_LEVEL_DEBUG "Executing: $command ${args[*]}"
if "$command" "${args[@]}"; then
return ${EXIT_CODES[SUCCESS]}
else
local exit_code=$?
log $LOG_LEVEL_ERROR "Command failed: $command (exit: $exit_code)"
return $exit_code
fi
}
with_retry() {
local max_attempts=$1
local delay=$2
local command=$3
shift 3
local args=("$@")
local attempt=1
while [ $attempt -le $max_attempts ]; do
log $LOG_LEVEL_INFO "Attempt $attempt/$max_attempts: $command"
if "$command" "${args[@]}"; then
log $LOG_LEVEL_INFO "Command succeeded on attempt $attempt"
return ${EXIT_CODES[SUCCESS]}
fi
local exit_code=$?
log $LOG_LEVEL_WARN "Attempt $attempt failed with code $exit_code"
if [ $attempt -lt $max_attempts ]; then
log $LOG_LEVEL_INFO "Retrying in ${delay}s..."
sleep $delay
fi
((attempt++))
done
log $LOG_LEVEL_ERROR "Command failed after $max_attempts attempts"
return ${EXIT_CODES[TIMEOUT]}
}
with_timeout() {
local timeout=$1
local command=$2
shift 2
local args=("$@")
log $LOG_LEVEL_DEBUG "Executing with timeout ${timeout}s: $command"
# Start command in background
"$command" "${args[@]}" &
local pid=$!
# Start timeout killer
(
sleep $timeout
if kill -0 $pid 2>/dev/null; then
log $LOG_LEVEL_WARN "Command timed out, killing PID: $pid"
kill -TERM $pid 2>/dev/null
sleep 1
kill -KILL $pid 2>/dev/null
fi
) &
local timeout_pid=$!
# Wait for command to complete
wait $pid 2>/dev/null
local exit_code=$?
# Kill timeout killer
kill $timeout_pid 2>/dev/null
if [ $exit_code -eq 0 ]; then
log $LOG_LEVEL_DEBUG "Command completed within timeout"
else
log $LOG_LEVEL_ERROR "Command failed or timed out (exit: $exit_code)"
fi
return $exit_code
}
# Export functions
export -f log validate_file validate_directory validate_command safe_exec with_retry with_timeout
Best Practices Checklist
Always start scripts with set -euo pipefail
Use meaningful custom exit codes (1-125)
Implement cleanup with trap cleanup EXIT
Handle Ctrl+C with trap handler INT
Validate all inputs before processing
Log errors with timestamps and context
Use $? immediately after commands
Test error paths as thoroughly as success paths
Use shellcheck to catch common issues
Implement retry logic for transient failures
Add timeout for long-running operations
Create lock files for mutual exclusion
Clean up temporary files on exit
Document error codes and their meanings
Test scripts with invalid inputs
Critical Errors to Always Handle:
1. Missing dependencies (check with command -v)
2. Permission denied (validate before operations)
3. Disk full (check available space)
4. Network failures (timeout and retry)
5. Invalid user input (validate and sanitize)
6. Concurrent execution (use lock files)
7. Resource exhaustion (monitor memory/CPU)
8. Signal interrupts (handle Ctrl+C gracefully)
Debugging Workflow:
1. Reproduce error: Run script with same inputs
2. Enable tracing: bash -x script.sh
3. Check exit codes: echo $? after each command
4. Validate inputs: Add set -u to catch unset variables
5. Isolate problem: Comment out sections until error disappears
6. Add logging: Insert debug messages before suspicious code
7. Test fixes: Make minimal changes and retest
8. Prevent regression: Add test cases for the error
Robust Scripts for Production
Error handling is what separates amateur scripts from production-ready tools. A script that works perfectly when everything goes right is useless if it crashes and leaves a mess when something goes wrong.
Remember: Good error handling doesn't prevent errors - it anticipates them. It ensures your scripts fail gracefully, clean up after themselves, and provide enough information to fix the problem.
Practice Exercise: Take a simple script you've written and add comprehensive error handling. Add input validation, cleanup traps, meaningful exit codes, and logging. Then test it by deliberately causing errors (remove files, revoke permissions, etc.).