Discover how to save hours of manual work by automating routine system tasks. This comprehensive guide provides ready-to-use Bash scripts with detailed explanations, practical examples, and best practices for implementing automation in your DevOps workflow.
Why Automation Matters for DevOps
Automation is the cornerstone of modern DevOps. It transforms repetitive, error-prone manual tasks into reliable, consistent processes. Benefits include:
- Time Savings: Reclaim hours spent on routine maintenance
- Error Reduction: Eliminate human mistakes in repetitive tasks
- Consistency: Ensure identical execution every time
- Documentation: Scripts serve as living documentation
- Scalability: Manage more systems with the same effort
1. Backup Automation
Regular backups are critical for disaster recovery. Automated backups ensure consistency and reliability. A good backup strategy includes:
- Regular scheduling (daily, weekly, monthly)
- Retention policies (keep backups for specific periods)
- Verification (ensure backups are usable)
- Logging (track backup success/failure)
- Off-site storage (protect against local failures)
File System Backup Script
This script backs up important directories, compresses them, and manages retention:
#!/bin/bash
# backup-files.sh - Comprehensive file backup automation
# ================= CONFIGURATION =================
BACKUP_DIR="/backups"
SOURCE_DIRS=("/etc" "/home" "/var/www") # Directories to backup
BACKUP_NAME="backup-$(date +%Y%m%d_%H%M%S).tar.gz"
LOG_FILE="/var/log/backup.log"
RETENTION_DAYS=7 # Keep backups for 7 days
# ================= VALIDATION =================
# Check if backup directory exists and is writable
if [ ! -d "$BACKUP_DIR" ]; then
mkdir -p "$BACKUP_DIR" || {
echo "ERROR: Cannot create backup directory $BACKUP_DIR"
exit 1
}
fi
if [ ! -w "$BACKUP_DIR" ]; then
echo "ERROR: Backup directory $BACKUP_DIR is not writable"
exit 1
fi
# ================= BACKUP CREATION =================
echo "[$(date '+%Y-%m-%d %H:%M:%S')] Starting backup..." | tee -a "$LOG_FILE"
# Create compressed backup archive
if tar -czf "$BACKUP_DIR/$BACKUP_NAME" "${SOURCE_DIRS[@]}" 2>/dev/null; then
# Calculate and log backup size
BACKUP_SIZE=$(du -h "$BACKUP_DIR/$BACKUP_NAME" | cut -f1)
echo "[$(date '+%Y-%m-%d %H:%M:%S')] Backup created: $BACKUP_NAME ($BACKUP_SIZE)" | tee -a "$LOG_FILE"
else
echo "[$(date '+%Y-%m-%d %H:%M:%S')] ERROR: Backup creation failed" | tee -a "$LOG_FILE"
exit 1
fi
# ================= RETENTION MANAGEMENT =================
echo "[$(date '+%Y-%m-%d %H:%M:%S')] Cleaning up old backups..." | tee -a "$LOG_FILE"
# Find and delete backups older than RETENTION_DAYS
OLD_BACKUPS=$(find "$BACKUP_DIR" -name "backup-*.tar.gz" -mtime +$RETENTION_DAYS)
if [ -n "$OLD_BACKUPS" ]; then
echo "[$(date '+%Y-%m-%d %H:%M:%S')] Removing old backups:" | tee -a "$LOG_FILE"
echo "$OLD_BACKUPS" | tee -a "$LOG_FILE"
find "$BACKUP_DIR" -name "backup-*.tar.gz" -mtime +$RETENTION_DAYS -delete
fi
# ================= VERIFICATION =================
# Verify backup integrity
if tar -tzf "$BACKUP_DIR/$BACKUP_NAME" > /dev/null 2>&1; then
echo "[$(date '+%Y-%m-%d %H:%M:%S')] Backup verification successful" | tee -a "$LOG_FILE"
else
echo "[$(date '+%Y-%m-%d %H:%M:%S')] ERROR: Backup verification failed" | tee -a "$LOG_FILE"
exit 1
fi
echo "[$(date '+%Y-%m-%d %H:%M:%S')] Backup process completed successfully" | tee -a "$LOG_FILE"
1. Configuration: Defines what to backup and where
2. Validation: Checks permissions and directory existence
3. Backup Creation: Uses tar to create compressed archive
4. Retention Management: Automatically removes old backups
5. Verification: Ensures backup is usable
6. Logging: Tracks all activities with timestamps
MySQL Database Backup Script
Database backups require special handling. This script backs up all MySQL databases individually:
#!/bin/bash
# backup-mysql.sh - Automated MySQL database backups
# ================= CONFIGURATION =================
DB_USER="root"
DB_PASS="your_secure_password_here"
BACKUP_DIR="/backups/mysql"
DATE_SUFFIX=$(date +%Y%m%d_%H%M%S)
RETENTION_COUNT=10 # Keep last 10 backups per database
# ================= SECURITY CHECK =================
# Use credentials file instead of command line for security
if [ ! -f ~/.my.cnf ]; then
echo "ERROR: MySQL credentials file ~/.my.cnf not found"
echo "Create it with:"
echo "[client]"
echo "user=your_user"
echo "password=your_password"
exit 1
fi
# ================= PREPARATION =================
mkdir -p "$BACKUP_DIR"
# Get list of all databases (excluding system databases)
DATABASES=$(mysql -e "SHOW DATABASES;" | grep -Ev "(Database|information_schema|performance_schema|mysql|sys)")
if [ -z "$DATABASES" ]; then
echo "ERROR: No databases found or cannot connect to MySQL"
exit 1
fi
# ================= BACKUP EACH DATABASE =================
TOTAL_DBS=$(echo "$DATABASES" | wc -l)
CURRENT=1
echo "Backing up $TOTAL_DBS databases..."
for DB in $DATABASES; do
echo "[$CURRENT/$TOTAL_DBS] Backing up database: $DB"
BACKUP_FILE="$BACKUP_DIR/${DB}_${DATE_SUFFIX}.sql.gz"
# Backup with compression
if mysqldump --single-transaction --quick --lock-tables=false "$DB" | gzip > "$BACKUP_FILE"; then
BACKUP_SIZE=$(du -h "$BACKUP_FILE" | cut -f1)
echo " โ
Success: $BACKUP_FILE ($BACKUP_SIZE)"
else
echo " โ Failed to backup: $DB"
# Continue with other databases
fi
((CURRENT++))
done
# ================= RETENTION MANAGEMENT =================
echo "Applying retention policy (keeping last $RETENTION_COUNT backups per database)..."
for DB in $DATABASES; do
# List backups for this database, sort by time (newest first)
BACKUP_LIST=$(ls -t "$BACKUP_DIR/${DB}_"*.sql.gz 2>/dev/null)
if [ -n "$BACKUP_LIST" ]; then
COUNT=$(echo "$BACKUP_LIST" | wc -l)
if [ $COUNT -gt $RETENTION_COUNT ]; then
TO_DELETE=$(echo "$BACKUP_LIST" | tail -n +$(($RETENTION_COUNT + 1)))
echo " Removing $(echo "$TO_DELETE" | wc -l) old backups for $DB"
echo "$TO_DELETE" | xargs rm -f
fi
fi
done
echo "MySQL backup process completed at $(date)"
2. System Monitoring Automation
Proactive monitoring prevents system issues before they cause outages. Automated monitoring scripts should:
- Check critical resources (CPU, memory, disk, network)
- Monitor service availability
- Detect security issues
- Send alerts when thresholds are exceeded
- Generate reports for trend analysis
Comprehensive System Health Check
#!/bin/bash
# system-health-check.sh - Comprehensive system monitoring
# ================= CONFIGURATION =================
THRESHOLD_DISK=85 # Disk usage percentage threshold
THRESHOLD_MEMORY=90 # Memory usage percentage threshold
THRESHOLD_LOAD=4.0 # System load average threshold
ALERT_EMAIL="admin@example.com"
LOG_FILE="/var/log/health-check.log"
REPORT_FILE="/tmp/system-health-$(date +%Y%m%d).txt"
# ================= INITIALIZATION =================
echo "=== System Health Check - $(date) ===" > "$REPORT_FILE"
echo "" >> "$REPORT_FILE"
ALERTS=()
WARNINGS=()
# ================= DISK SPACE CHECK =================
echo "1. Disk Space Analysis:" >> "$REPORT_FILE"
echo "------------------------" >> "$REPORT_FILE"
df -h | grep '^/dev/' | while read -r line; do
FILESYSTEM=$(echo "$line" | awk '{print $1}')
SIZE=$(echo "$line" | awk '{print $2}')
USED=$(echo "$line" | awk '{print $3}')
AVAIL=$(echo "$line" | awk '{print $4}')
USE_PERCENT=$(echo "$line" | awk '{print $5}' | sed 's/%//')
MOUNT=$(echo "$line" | awk '{print $6}')
echo " $MOUNT ($FILESYSTEM): $USE_PERCENT% used, $AVAIL available" >> "$REPORT_FILE"
# Check threshold
if [ "$USE_PERCENT" -ge "$THRESHOLD_DISK" ]; then
ALERT="CRITICAL: Disk usage on $MOUNT is ${USE_PERCENT}%"
ALERTS+=("$ALERT")
echo " โ ๏ธ $ALERT" >> "$REPORT_FILE"
elif [ "$USE_PERCENT" -ge 70 ]; then
WARNING="WARNING: Disk usage on $MOUNT is ${USE_PERCENT}%"
WARNINGS+=("$WARNING")
echo " โน๏ธ $WARNING" >> "$REPORT_FILE"
fi
done
# ================= MEMORY USAGE CHECK =================
echo "" >> "$REPORT_FILE"
echo "2. Memory Usage:" >> "$REPORT_FILE"
echo "----------------" >> "$REPORT_FILE"
MEM_TOTAL=$(free -m | awk '/^Mem:/ {print $2}')
MEM_USED=$(free -m | awk '/^Mem:/ {print $3}')
MEM_FREE=$(free -m | awk '/^Mem:/ {print $4}')
MEM_PERCENT=$((MEM_USED * 100 / MEM_TOTAL))
echo " Total: ${MEM_TOTAL}MB, Used: ${MEM_USED}MB, Free: ${MEM_FREE}MB" >> "$REPORT_FILE"
echo " Usage: ${MEM_PERCENT}%" >> "$REPORT_FILE"
if [ "$MEM_PERCENT" -ge "$THRESHOLD_MEMORY" ]; then
ALERT="CRITICAL: Memory usage is ${MEM_PERCENT}%"
ALERTS+=("$ALERT")
echo " โ ๏ธ $ALERT" >> "$REPORT_FILE"
fi
# ================= CPU LOAD CHECK =================
echo "" >> "$REPORT_FILE"
echo "3. CPU Load Average:" >> "$REPORT_FILE"
echo "--------------------" >> "$REPORT_FILE"
LOAD1=$(uptime | awk -F'load average:' '{print $2}' | cut -d, -f1 | xargs)
LOAD5=$(uptime | awk -F'load average:' '{print $2}' | cut -d, -f2 | xargs)
LOAD15=$(uptime | awk -F'load average:' '{print $2}' | cut -d, -f3 | xargs)
echo " 1-minute: $LOAD1, 5-minute: $LOAD5, 15-minute: $LOAD15" >> "$REPORT_FILE"
# Get CPU core count to calculate relative load
CPU_CORES=$(nproc)
LOAD_THRESHOLD=$(echo "$CPU_CORES * 1.5" | bc)
if (( $(echo "$LOAD1 > $LOAD_THRESHOLD" | bc -l) )); then
ALERT="CRITICAL: High CPU load: $LOAD1 (cores: $CPU_CORES)"
ALERTS+=("$ALERT")
echo " โ ๏ธ $ALERT" >> "$REPORT_FILE"
fi
# ================= SERVICE STATUS CHECK =================
echo "" >> "$REPORT_FILE"
echo "4. Service Status:" >> "$REPORT_FILE"
echo "------------------" >> "$REPORT_FILE"
SERVICES=("ssh" "nginx" "mysql" "docker" "cron")
for SERVICE in "${SERVICES[@]}"; do
if systemctl is-active --quiet "$SERVICE"; then
echo " โ
$SERVICE: RUNNING" >> "$REPORT_FILE"
else
ALERT="CRITICAL: Service $SERVICE is not running"
ALERTS+=("$ALERT")
echo " โ $SERVICE: STOPPED" >> "$REPORT_FILE"
# Attempt to restart if it's a critical service
if [[ "$SERVICE" == "ssh" ]] || [[ "$SERVICE" == "nginx" ]]; then
echo " Attempting to restart $SERVICE..." >> "$REPORT_FILE"
systemctl restart "$SERVICE" && \
echo " โ
Restarted successfully" >> "$REPORT_FILE" || \
echo " โ Restart failed" >> "$REPORT_FILE"
fi
fi
done
# ================= SECURITY CHECKS =================
echo "" >> "$REPORT_FILE"
echo "5. Security Checks:" >> "$REPORT_FILE"
echo "-------------------" >> "$REPORT_FILE"
# Check for failed SSH logins (last hour)
FAILED_SSH=$(grep "Failed password" /var/log/auth.log 2>/dev/null | grep "$(date '+%b %_d %H:')" | wc -l)
if [ "$FAILED_SSH" -gt 10 ]; then
WARNING="Multiple failed SSH attempts: $FAILED_SSH in last hour"
WARNINGS+=("$WARNING")
echo " โ ๏ธ $WARNING" >> "$REPORT_FILE"
fi
# Check for root login attempts
ROOT_LOGINS=$(grep "root" /var/log/auth.log 2>/dev/null | grep "Failed password" | tail -5)
if [ -n "$ROOT_LOGINS" ]; then
echo " โ ๏ธ Recent failed root login attempts detected" >> "$REPORT_FILE"
fi
# ================= SUMMARY AND ALERTS =================
echo "" >> "$REPORT_FILE"
echo "=== SUMMARY ===" >> "$REPORT_FILE"
echo "Total checks performed: $(date)" >> "$REPORT_FILE"
if [ ${#ALERTS[@]} -eq 0 ]; then
echo "โ
No critical issues detected" >> "$REPORT_FILE"
else
echo "โ ๏ธ CRITICAL ISSUES DETECTED: ${#ALERTS[@]}" >> "$REPORT_FILE"
for alert in "${ALERTS[@]}"; do
echo " โข $alert" >> "$REPORT_FILE"
done
# Send email alert if there are critical issues
if [ -n "$ALERT_EMAIL" ] && [ ${#ALERTS[@]} -gt 0 ]; then
mail -s "SYSTEM ALERT: Critical issues detected on $(hostname)" \
"$ALERT_EMAIL" < "$REPORT_FILE"
echo "Alert email sent to $ALERT_EMAIL" >> "$REPORT_FILE"
fi
fi
if [ ${#WARNINGS[@]} -gt 0 ]; then
echo "" >> "$REPORT_FILE"
echo "โน๏ธ WARNINGS: ${#WARNINGS[@]}" >> "$REPORT_FILE"
for warning in "${WARNINGS[@]}"; do
echo " โข $warning" >> "$REPORT_FILE"
done
fi
# ================= LOGGING =================
cat "$REPORT_FILE" >> "$LOG_FILE"
echo "Health check completed. Report: $REPORT_FILE"
3. Cron Job Scheduling
Cron is Linux's built-in job scheduler that runs commands at specified times. Understanding cron syntax is essential for automation:
โโโโโโโโโโโโโโ minute (0 - 59) โ โโโโโโโโโโโโโโ hour (0 - 23) โ โ โโโโโโโโโโโโโโ day of month (1 - 31) โ โ โ โโโโโโโโโโโโโโ month (1 - 12) โ โ โ โ โโโโโโโโโโโโโโ day of week (0 - 6) (Sunday=0 or 7) โ โ โ โ โ โ โ โ โ โ * * * * * command_to_execute
Perfect for nightly backups when system usage is low
Ideal for service monitoring and alerting
Weekly maintenance tasks and reports
Monthly archiving and cleanup tasks
Complete Crontab Example
# ================= SYSTEM MAINTENANCE =================
# Daily backup at 2 AM
0 2 * * * /opt/scripts/backup-files.sh >> /var/log/backup.log 2>&1
# MySQL backup at 3 AM
0 3 * * * /opt/scripts/backup-mysql.sh >> /var/log/mysql-backup.log 2>&1
# System update every Sunday at 4 AM
0 4 * * 0 /opt/scripts/update-system.sh >> /var/log/update.log 2>&1
# ================= MONITORING =================
# Health check every 30 minutes
*/30 * * * * /opt/scripts/system-health-check.sh >> /var/log/health-check.log 2>&1
# Disk space check every hour
0 * * * * /opt/scripts/check-disk-space.sh >> /var/log/disk-check.log 2>&1
# Service monitoring every 5 minutes
*/5 * * * * /opt/scripts/monitor-services.sh >> /var/log/service-monitor.log 2>&1
# ================= CLEANUP TASKS =================
# Log rotation every day at 1 AM
0 1 * * * /opt/scripts/rotate-logs.sh >> /var/log/log-rotation.log 2>&1
# Temporary file cleanup every Monday at 5 AM
0 5 * * 1 /opt/scripts/cleanup-temp.sh >> /var/log/cleanup.log 2>&1
# Docker cleanup every Sunday at 6 AM
0 6 * * 0 /opt/scripts/cleanup-docker.sh >> /var/log/docker-cleanup.log 2>&1
# ================= SECURITY =================
# Security scan every day at 11 PM
0 23 * * * /opt/scripts/security-scan.sh >> /var/log/security.log 2>&1
# SSH key rotation on 1st of every month at 2 AM
0 2 1 * * /opt/scripts/rotate-ssh-keys.sh >> /var/log/ssh-rotation.log 2>&1
1. Use absolute paths: Cron jobs don't inherit your shell's PATH
2. Redirect output: Use
>> logfile 2>&1 to capture logs3. Test manually first: Run scripts manually before scheduling
4. Use locking: Prevent overlapping executions with lock files
5. Monitor cron logs: Check
/var/log/cron.log or /var/log/syslog6. Document schedules: Keep a record of what runs when and why
4. Practical Automation Scripts
Log Rotation and Cleanup
#!/bin/bash
# rotate-logs.sh - Automated log rotation and cleanup
# ================= CONFIGURATION =================
LOG_DIRS=("/var/log" "/var/log/apache2" "/var/log/nginx" "/var/log/mysql")
RETENTION_DAYS=30
COMPRESS_AFTER_DAYS=7
MAX_LOG_SIZE="10M"
BACKUP_DIR="/backups/logs"
# ================= FUNCTIONS =================
rotate_log() {
local log_file=$1
# Skip if file doesn't exist or is empty
[ ! -f "$log_file" ] && return
[ ! -s "$log_file" ] && return
# Create backup directory
local backup_path="$BACKUP_DIR/$(dirname "$log_file" | sed 's|^/||')"
mkdir -p "$backup_path"
# Rotate log
local timestamp=$(date +%Y%m%d_%H%M%S)
local backup_file="$backup_path/$(basename "$log_file")_$timestamp"
cp "$log_file" "$backup_file"
> "$log_file" # Truncate original log
echo "Rotated: $log_file -> $backup_file"
}
compress_old_logs() {
local dir=$1
find "$dir" -name "*.log" -type f -mtime +$COMPRESS_AFTER_DAYS ! -name "*.gz" | \
while read -r log_file; do
if gzip "$log_file"; then
echo "Compressed: $log_file.gz"
fi
done
}
cleanup_old_backups() {
find "$BACKUP_DIR" -name "*.log_*" -type f -mtime +$RETENTION_DAYS -delete
echo "Cleaned up backups older than $RETENTION_DAYS days"
}
# ================= MAIN EXECUTION =================
echo "Starting log rotation at $(date)"
# Rotate large log files
for dir in "${LOG_DIRS[@]}"; do
if [ -d "$dir" ]; then
echo "Processing directory: $dir"
# Find large log files
find "$dir" -name "*.log" -type f -size +$MAX_LOG_SIZE | \
while read -r log_file; do
rotate_log "$log_file"
done
# Compress old logs
compress_old_logs "$dir"
fi
done
# Cleanup old backups
cleanup_old_backups
echo "Log rotation completed at $(date)"
5. Best Practices for Production Automation
if ! command; then echo "Failed"; exit 1; fiecho "[$(date)] Starting backup" >> $LOG_FILEBACKUP_DIR="/backups"
RETENTION_DAYS=7[ -d "$DIR" ] || mkdir -p "$DIR"# Backup MySQL databases with compression./script.sh --dry-runGetting Started: Your First Automation
Follow these steps to implement your first automation:
- Identify a repetitive task you do daily or weekly
- Write a simple script to automate just that task
- Test it manually with different scenarios
- Add error handling and logging
- Schedule it with cron to run automatically
- Monitor the results for a week
- Improve based on feedback and add more features
1. Always backup before cleanup: Never delete without backup
2. Test with dry-run first: Add
--dry-run option to preview changes3. Set appropriate permissions: Don't run everything as root
4. Monitor disk space: Ensure you have space before large operations
5. Use locking mechanisms: Prevent script overlapping with
flock6. Keep scripts in version control: Track changes with git
7. Document dependencies: List required packages and permissions
Start Your Automation Journey Today
Automation is not an all-or-nothing proposition. Start with one script that solves one problem. As you gain confidence, expand to more complex automations. The time you invest in automation pays for itself many times over.
Remember: The best automation is the one that gets used. Focus on solving real pain points in your daily workflow.
Next Steps: Pick one script from this guide, customize it for your environment, test it thoroughly, and schedule it. Monitor the results for a week, then iterate and improve.