Automating System Tasks | Practical DevOps Examples

Discover how to save hours of manual work by automating routine system tasks. This comprehensive guide provides ready-to-use Bash scripts with detailed explanations, practical examples, and best practices for implementing automation in your DevOps workflow.

Why Automation Matters for DevOps

Automation is the cornerstone of modern DevOps. It transforms repetitive, error-prone manual tasks into reliable, consistent processes. Benefits include:

Time Savings: Reclaim hours spent on routine maintenance
Error Reduction: Eliminate human mistakes in repetitive tasks
Consistency: Ensure identical execution every time
Documentation: Scripts serve as living documentation
Scalability: Manage more systems with the same effort

💾

Backup Automation

Automate file, database, and configuration backups with retention policies and logging

📊

System Monitoring

Monitor disk space, services, logs, and performance metrics with automated alerts

🧹

Cleanup Tasks

Automate log rotation, temp file cleanup, and resource reclamation

🚀

Deployment Automation

Streamline code deployments, service restarts, and configuration updates

1. Backup Automation

Regular backups are critical for disaster recovery. Automated backups ensure consistency and reliability. A good backup strategy includes:

Regular scheduling (daily, weekly, monthly)
Retention policies (keep backups for specific periods)
Verification (ensure backups are usable)
Logging (track backup success/failure)
Off-site storage (protect against local failures)

File System Backup Script

This script backs up important directories, compresses them, and manages retention:

#!/bin/bash

# Configuration Section

BACKUP_DIR="/backups"

SOURCE_DIRS=("/etc" "/home" "/var/www")

# Backup Creation

tar -czf "$BACKUP_DIR/$BACKUP_NAME" "${SOURCE_DIRS[@]}"

# Retention Management

find "$BACKUP_DIR" -name "backup-*.tar.gz" -mtime +7 -delete

#!/bin/bash
# backup-files.sh - Comprehensive file backup automation

# ================= CONFIGURATION =================
BACKUP_DIR="/backups"
SOURCE_DIRS=("/etc" "/home" "/var/www")  # Directories to backup
BACKUP_NAME="backup-$(date +%Y%m%d_%H%M%S).tar.gz"
LOG_FILE="/var/log/backup.log"
RETENTION_DAYS=7  # Keep backups for 7 days

# ================= VALIDATION =================
# Check if backup directory exists and is writable
if [ ! -d "$BACKUP_DIR" ]; then
    mkdir -p "$BACKUP_DIR" || {
        echo "ERROR: Cannot create backup directory $BACKUP_DIR"
        exit 1
    }
fi

if [ ! -w "$BACKUP_DIR" ]; then
    echo "ERROR: Backup directory $BACKUP_DIR is not writable"
    exit 1
fi

# ================= BACKUP CREATION =================
echo "[$(date '+%Y-%m-%d %H:%M:%S')] Starting backup..." | tee -a "$LOG_FILE"

# Create compressed backup archive
if tar -czf "$BACKUP_DIR/$BACKUP_NAME" "${SOURCE_DIRS[@]}" 2>/dev/null; then
    # Calculate and log backup size
    BACKUP_SIZE=$(du -h "$BACKUP_DIR/$BACKUP_NAME" | cut -f1)
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] Backup created: $BACKUP_NAME ($BACKUP_SIZE)" | tee -a "$LOG_FILE"
else
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] ERROR: Backup creation failed" | tee -a "$LOG_FILE"
    exit 1
fi

# ================= RETENTION MANAGEMENT =================
echo "[$(date '+%Y-%m-%d %H:%M:%S')] Cleaning up old backups..." | tee -a "$LOG_FILE"

# Find and delete backups older than RETENTION_DAYS
OLD_BACKUPS=$(find "$BACKUP_DIR" -name "backup-*.tar.gz" -mtime +$RETENTION_DAYS)

if [ -n "$OLD_BACKUPS" ]; then
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] Removing old backups:" | tee -a "$LOG_FILE"
    echo "$OLD_BACKUPS" | tee -a "$LOG_FILE"
    find "$BACKUP_DIR" -name "backup-*.tar.gz" -mtime +$RETENTION_DAYS -delete
fi

# ================= VERIFICATION =================
# Verify backup integrity
if tar -tzf "$BACKUP_DIR/$BACKUP_NAME" > /dev/null 2>&1; then
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] Backup verification successful" | tee -a "$LOG_FILE"
else
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] ERROR: Backup verification failed" | tee -a "$LOG_FILE"
    exit 1
fi

echo "[$(date '+%Y-%m-%d %H:%M:%S')] Backup process completed successfully" | tee -a "$LOG_FILE"

How This Script Works:
1. Configuration: Defines what to backup and where
2. Validation: Checks permissions and directory existence
3. Backup Creation: Uses tar to create compressed archive
4. Retention Management: Automatically removes old backups
5. Verification: Ensures backup is usable
6. Logging: Tracks all activities with timestamps

MySQL Database Backup Script

Database backups require special handling. This script backs up all MySQL databases individually:

#!/bin/bash
# backup-mysql.sh - Automated MySQL database backups

# ================= CONFIGURATION =================
DB_USER="root"
DB_PASS="your_secure_password_here"
BACKUP_DIR="/backups/mysql"
DATE_SUFFIX=$(date +%Y%m%d_%H%M%S)
RETENTION_COUNT=10  # Keep last 10 backups per database

# ================= SECURITY CHECK =================
# Use credentials file instead of command line for security
if [ ! -f ~/.my.cnf ]; then
    echo "ERROR: MySQL credentials file ~/.my.cnf not found"
    echo "Create it with:"
    echo "[client]"
    echo "user=your_user"
    echo "password=your_password"
    exit 1
fi

# ================= PREPARATION =================
mkdir -p "$BACKUP_DIR"

# Get list of all databases (excluding system databases)
DATABASES=$(mysql -e "SHOW DATABASES;" | grep -Ev "(Database|information_schema|performance_schema|mysql|sys)")

if [ -z "$DATABASES" ]; then
    echo "ERROR: No databases found or cannot connect to MySQL"
    exit 1
fi

# ================= BACKUP EACH DATABASE =================
TOTAL_DBS=$(echo "$DATABASES" | wc -l)
CURRENT=1

echo "Backing up $TOTAL_DBS databases..."

for DB in $DATABASES; do
    echo "[$CURRENT/$TOTAL_DBS] Backing up database: $DB"
    
    BACKUP_FILE="$BACKUP_DIR/${DB}_${DATE_SUFFIX}.sql.gz"
    
    # Backup with compression
    if mysqldump --single-transaction --quick --lock-tables=false "$DB" | gzip > "$BACKUP_FILE"; then
        BACKUP_SIZE=$(du -h "$BACKUP_FILE" | cut -f1)
        echo "  ✅ Success: $BACKUP_FILE ($BACKUP_SIZE)"
    else
        echo "  ❌ Failed to backup: $DB"
        # Continue with other databases
    fi
    
    ((CURRENT++))
done

# ================= RETENTION MANAGEMENT =================
echo "Applying retention policy (keeping last $RETENTION_COUNT backups per database)..."

for DB in $DATABASES; do
    # List backups for this database, sort by time (newest first)
    BACKUP_LIST=$(ls -t "$BACKUP_DIR/${DB}_"*.sql.gz 2>/dev/null)
    
    if [ -n "$BACKUP_LIST" ]; then
        COUNT=$(echo "$BACKUP_LIST" | wc -l)
        
        if [ $COUNT -gt $RETENTION_COUNT ]; then
            TO_DELETE=$(echo "$BACKUP_LIST" | tail -n +$(($RETENTION_COUNT + 1)))
            echo "  Removing $(echo "$TO_DELETE" | wc -l) old backups for $DB"
            echo "$TO_DELETE" | xargs rm -f
        fi
    fi
done

echo "MySQL backup process completed at $(date)"

2. System Monitoring Automation

Proactive monitoring prevents system issues before they cause outages. Automated monitoring scripts should:

Check critical resources (CPU, memory, disk, network)
Monitor service availability
Detect security issues
Send alerts when thresholds are exceeded
Generate reports for trend analysis

Comprehensive System Health Check

#!/bin/bash
# system-health-check.sh - Comprehensive system monitoring

# ================= CONFIGURATION =================
THRESHOLD_DISK=85    # Disk usage percentage threshold
THRESHOLD_MEMORY=90  # Memory usage percentage threshold
THRESHOLD_LOAD=4.0   # System load average threshold
ALERT_EMAIL="admin@example.com"
LOG_FILE="/var/log/health-check.log"
REPORT_FILE="/tmp/system-health-$(date +%Y%m%d).txt"

# ================= INITIALIZATION =================
echo "=== System Health Check - $(date) ===" > "$REPORT_FILE"
echo "" >> "$REPORT_FILE"

ALERTS=()
WARNINGS=()

# ================= DISK SPACE CHECK =================
echo "1. Disk Space Analysis:" >> "$REPORT_FILE"
echo "------------------------" >> "$REPORT_FILE"

df -h | grep '^/dev/' | while read -r line; do
    FILESYSTEM=$(echo "$line" | awk '{print $1}')
    SIZE=$(echo "$line" | awk '{print $2}')
    USED=$(echo "$line" | awk '{print $3}')
    AVAIL=$(echo "$line" | awk '{print $4}')
    USE_PERCENT=$(echo "$line" | awk '{print $5}' | sed 's/%//')
    MOUNT=$(echo "$line" | awk '{print $6}')
    
    echo "  $MOUNT ($FILESYSTEM): $USE_PERCENT% used, $AVAIL available" >> "$REPORT_FILE"
    
    # Check threshold
    if [ "$USE_PERCENT" -ge "$THRESHOLD_DISK" ]; then
        ALERT="CRITICAL: Disk usage on $MOUNT is ${USE_PERCENT}%"
        ALERTS+=("$ALERT")
        echo "    ⚠️ $ALERT" >> "$REPORT_FILE"
    elif [ "$USE_PERCENT" -ge 70 ]; then
        WARNING="WARNING: Disk usage on $MOUNT is ${USE_PERCENT}%"
        WARNINGS+=("$WARNING")
        echo "    ℹ️ $WARNING" >> "$REPORT_FILE"
    fi
done

# ================= MEMORY USAGE CHECK =================
echo "" >> "$REPORT_FILE"
echo "2. Memory Usage:" >> "$REPORT_FILE"
echo "----------------" >> "$REPORT_FILE"

MEM_TOTAL=$(free -m | awk '/^Mem:/ {print $2}')
MEM_USED=$(free -m | awk '/^Mem:/ {print $3}')
MEM_FREE=$(free -m | awk '/^Mem:/ {print $4}')
MEM_PERCENT=$((MEM_USED * 100 / MEM_TOTAL))

echo "  Total: ${MEM_TOTAL}MB, Used: ${MEM_USED}MB, Free: ${MEM_FREE}MB" >> "$REPORT_FILE"
echo "  Usage: ${MEM_PERCENT}%" >> "$REPORT_FILE"

if [ "$MEM_PERCENT" -ge "$THRESHOLD_MEMORY" ]; then
    ALERT="CRITICAL: Memory usage is ${MEM_PERCENT}%"
    ALERTS+=("$ALERT")
    echo "  ⚠️ $ALERT" >> "$REPORT_FILE"
fi

# ================= CPU LOAD CHECK =================
echo "" >> "$REPORT_FILE"
echo "3. CPU Load Average:" >> "$REPORT_FILE"
echo "--------------------" >> "$REPORT_FILE"

LOAD1=$(uptime | awk -F'load average:' '{print $2}' | cut -d, -f1 | xargs)
LOAD5=$(uptime | awk -F'load average:' '{print $2}' | cut -d, -f2 | xargs)
LOAD15=$(uptime | awk -F'load average:' '{print $2}' | cut -d, -f3 | xargs)

echo "  1-minute: $LOAD1, 5-minute: $LOAD5, 15-minute: $LOAD15" >> "$REPORT_FILE"

# Get CPU core count to calculate relative load
CPU_CORES=$(nproc)
LOAD_THRESHOLD=$(echo "$CPU_CORES * 1.5" | bc)

if (( $(echo "$LOAD1 > $LOAD_THRESHOLD" | bc -l) )); then
    ALERT="CRITICAL: High CPU load: $LOAD1 (cores: $CPU_CORES)"
    ALERTS+=("$ALERT")
    echo "  ⚠️ $ALERT" >> "$REPORT_FILE"
fi

# ================= SERVICE STATUS CHECK =================
echo "" >> "$REPORT_FILE"
echo "4. Service Status:" >> "$REPORT_FILE"
echo "------------------" >> "$REPORT_FILE"

SERVICES=("ssh" "nginx" "mysql" "docker" "cron")

for SERVICE in "${SERVICES[@]}"; do
    if systemctl is-active --quiet "$SERVICE"; then
        echo "  ✅ $SERVICE: RUNNING" >> "$REPORT_FILE"
    else
        ALERT="CRITICAL: Service $SERVICE is not running"
        ALERTS+=("$ALERT")
        echo "  ❌ $SERVICE: STOPPED" >> "$REPORT_FILE"
        
        # Attempt to restart if it's a critical service
        if [[ "$SERVICE" == "ssh" ]] || [[ "$SERVICE" == "nginx" ]]; then
            echo "    Attempting to restart $SERVICE..." >> "$REPORT_FILE"
            systemctl restart "$SERVICE" && \
            echo "    ✅ Restarted successfully" >> "$REPORT_FILE" || \
            echo "    ❌ Restart failed" >> "$REPORT_FILE"
        fi
    fi
done

# ================= SECURITY CHECKS =================
echo "" >> "$REPORT_FILE"
echo "5. Security Checks:" >> "$REPORT_FILE"
echo "-------------------" >> "$REPORT_FILE"

# Check for failed SSH logins (last hour)
FAILED_SSH=$(grep "Failed password" /var/log/auth.log 2>/dev/null | grep "$(date '+%b %_d %H:')" | wc -l)
if [ "$FAILED_SSH" -gt 10 ]; then
    WARNING="Multiple failed SSH attempts: $FAILED_SSH in last hour"
    WARNINGS+=("$WARNING")
    echo "  ⚠️ $WARNING" >> "$REPORT_FILE"
fi

# Check for root login attempts
ROOT_LOGINS=$(grep "root" /var/log/auth.log 2>/dev/null | grep "Failed password" | tail -5)
if [ -n "$ROOT_LOGINS" ]; then
    echo "  ⚠️ Recent failed root login attempts detected" >> "$REPORT_FILE"
fi

# ================= SUMMARY AND ALERTS =================
echo "" >> "$REPORT_FILE"
echo "=== SUMMARY ===" >> "$REPORT_FILE"
echo "Total checks performed: $(date)" >> "$REPORT_FILE"

if [ ${#ALERTS[@]} -eq 0 ]; then
    echo "✅ No critical issues detected" >> "$REPORT_FILE"
else
    echo "⚠️ CRITICAL ISSUES DETECTED: ${#ALERTS[@]}" >> "$REPORT_FILE"
    for alert in "${ALERTS[@]}"; do
        echo "  • $alert" >> "$REPORT_FILE"
    done
    
    # Send email alert if there are critical issues
    if [ -n "$ALERT_EMAIL" ] && [ ${#ALERTS[@]} -gt 0 ]; then
        mail -s "SYSTEM ALERT: Critical issues detected on $(hostname)" \
             "$ALERT_EMAIL" < "$REPORT_FILE"
        echo "Alert email sent to $ALERT_EMAIL" >> "$REPORT_FILE"
    fi
fi

if [ ${#WARNINGS[@]} -gt 0 ]; then
    echo "" >> "$REPORT_FILE"
    echo "ℹ️ WARNINGS: ${#WARNINGS[@]}" >> "$REPORT_FILE"
    for warning in "${WARNINGS[@]}"; do
        echo "  • $warning" >> "$REPORT_FILE"
    done
fi

# ================= LOGGING =================
cat "$REPORT_FILE" >> "$LOG_FILE"
echo "Health check completed. Report: $REPORT_FILE"

3. Cron Job Scheduling

Cron is Linux's built-in job scheduler that runs commands at specified times. Understanding cron syntax is essential for automation:

┌───────────── minute (0 - 59)
│ ┌───────────── hour (0 - 23)
│ │ ┌───────────── day of month (1 - 31)
│ │ │ ┌───────────── month (1 - 12)
│ │ │ │ ┌───────────── day of week (0 - 6) (Sunday=0 or 7)
│ │ │ │ │
│ │ │ │ │
* * * * * command_to_execute

0 2 * * *

Daily at 2:00 AM

Perfect for nightly backups when system usage is low

*/15 * * * *

Every 15 minutes

Ideal for service monitoring and alerting

0 0 * * 0

Weekly on Sunday at midnight

Weekly maintenance tasks and reports

0 0 1 * *

Monthly on the 1st at midnight

Monthly archiving and cleanup tasks

Complete Crontab Example

# ================= SYSTEM MAINTENANCE =================
# Daily backup at 2 AM
0 2 * * * /opt/scripts/backup-files.sh >> /var/log/backup.log 2>&1

# MySQL backup at 3 AM
0 3 * * * /opt/scripts/backup-mysql.sh >> /var/log/mysql-backup.log 2>&1

# System update every Sunday at 4 AM
0 4 * * 0 /opt/scripts/update-system.sh >> /var/log/update.log 2>&1

# ================= MONITORING =================
# Health check every 30 minutes
*/30 * * * * /opt/scripts/system-health-check.sh >> /var/log/health-check.log 2>&1

# Disk space check every hour
0 * * * * /opt/scripts/check-disk-space.sh >> /var/log/disk-check.log 2>&1

# Service monitoring every 5 minutes
*/5 * * * * /opt/scripts/monitor-services.sh >> /var/log/service-monitor.log 2>&1

# ================= CLEANUP TASKS =================
# Log rotation every day at 1 AM
0 1 * * * /opt/scripts/rotate-logs.sh >> /var/log/log-rotation.log 2>&1

# Temporary file cleanup every Monday at 5 AM
0 5 * * 1 /opt/scripts/cleanup-temp.sh >> /var/log/cleanup.log 2>&1

# Docker cleanup every Sunday at 6 AM
0 6 * * 0 /opt/scripts/cleanup-docker.sh >> /var/log/docker-cleanup.log 2>&1

# ================= SECURITY =================
# Security scan every day at 11 PM
0 23 * * * /opt/scripts/security-scan.sh >> /var/log/security.log 2>&1

# SSH key rotation on 1st of every month at 2 AM
0 2 1 * * /opt/scripts/rotate-ssh-keys.sh >> /var/log/ssh-rotation.log 2>&1

Cron Best Practices:
1. Use absolute paths: Cron jobs don't inherit your shell's PATH
2. Redirect output: Use >> logfile 2>&1 to capture logs
3. Test manually first: Run scripts manually before scheduling
4. Use locking: Prevent overlapping executions with lock files
5. Monitor cron logs: Check /var/log/cron.log or /var/log/syslog
6. Document schedules: Keep a record of what runs when and why

4. Practical Automation Scripts

Log Rotation and Cleanup

#!/bin/bash
# rotate-logs.sh - Automated log rotation and cleanup

# ================= CONFIGURATION =================
LOG_DIRS=("/var/log" "/var/log/apache2" "/var/log/nginx" "/var/log/mysql")
RETENTION_DAYS=30
COMPRESS_AFTER_DAYS=7
MAX_LOG_SIZE="10M"
BACKUP_DIR="/backups/logs"

# ================= FUNCTIONS =================
rotate_log() {
    local log_file=$1
    
    # Skip if file doesn't exist or is empty
    [ ! -f "$log_file" ] && return
    [ ! -s "$log_file" ] && return
    
    # Create backup directory
    local backup_path="$BACKUP_DIR/$(dirname "$log_file" | sed 's|^/||')"
    mkdir -p "$backup_path"
    
    # Rotate log
    local timestamp=$(date +%Y%m%d_%H%M%S)
    local backup_file="$backup_path/$(basename "$log_file")_$timestamp"
    
    cp "$log_file" "$backup_file"
    > "$log_file"  # Truncate original log
    
    echo "Rotated: $log_file -> $backup_file"
}

compress_old_logs() {
    local dir=$1
    
    find "$dir" -name "*.log" -type f -mtime +$COMPRESS_AFTER_DAYS ! -name "*.gz" | \
    while read -r log_file; do
        if gzip "$log_file"; then
            echo "Compressed: $log_file.gz"
        fi
    done
}

cleanup_old_backups() {
    find "$BACKUP_DIR" -name "*.log_*" -type f -mtime +$RETENTION_DAYS -delete
    echo "Cleaned up backups older than $RETENTION_DAYS days"
}

# ================= MAIN EXECUTION =================
echo "Starting log rotation at $(date)"

# Rotate large log files
for dir in "${LOG_DIRS[@]}"; do
    if [ -d "$dir" ]; then
        echo "Processing directory: $dir"
        
        # Find large log files
        find "$dir" -name "*.log" -type f -size +$MAX_LOG_SIZE | \
        while read -r log_file; do
            rotate_log "$log_file"
        done
        
        # Compress old logs
        compress_old_logs "$dir"
    fi
done

# Cleanup old backups
cleanup_old_backups

echo "Log rotation completed at $(date)"

5. Best Practices for Production Automation

Practice Description Example Error Handling Always check command exit codes and handle failures gracefully if ! command; then echo "Failed"; exit 1; fi Logging Log all actions with timestamps for debugging and auditing echo "[$(date)] Starting backup" >> $LOG_FILE Configuration Use variables at the top of scripts for easy maintenance

BACKUP_DIR="/backups"
RETENTION_DAYS=7

Validation Validate inputs, permissions, and dependencies before execution [ -d "$DIR" ] || mkdir -p "$DIR" Documentation Add comments explaining what, why, and how the script works # Backup MySQL databases with compression Testing Test scripts thoroughly before scheduling with cron ./script.sh --dry-run

Getting Started: Your First Automation

Follow these steps to implement your first automation:

Identify a repetitive task you do daily or weekly
Write a simple script to automate just that task
Test it manually with different scenarios
Add error handling and logging
Schedule it with cron to run automatically
Monitor the results for a week
Improve based on feedback and add more features

Critical Safety Tips:
1. Always backup before cleanup: Never delete without backup
2. Test with dry-run first: Add --dry-run option to preview changes
3. Set appropriate permissions: Don't run everything as root
4. Monitor disk space: Ensure you have space before large operations
5. Use locking mechanisms: Prevent script overlapping with flock
6. Keep scripts in version control: Track changes with git
7. Document dependencies: List required packages and permissions

Start Your Automation Journey Today

Automation is not an all-or-nothing proposition. Start with one script that solves one problem. As you gain confidence, expand to more complex automations. The time you invest in automation pays for itself many times over.

Remember: The best automation is the one that gets used. Focus on solving real pain points in your daily workflow.

Next Steps: Pick one script from this guide, customize it for your environment, test it thoroughly, and schedule it. Monitor the results for a week, then iterate and improve.

Automating System Tasks for DevOps