Common Boot Issues: Complete Guide to Troubleshooting System Boot Problems

Master Linux boot troubleshooting with this comprehensive guide covering GRUB errors, kernel panics, filesystem corruption, hardware failures, and recovery techniques for diagnosing and fixing system boot problems.

Linux Boot Process Architecture Boot Process Stages Stage 1: BIOS/UEFI Firmware POST, Hardware initialization, Boot device selection MBR/GPT reading, Boot loader loading Stage 2: Boot Loader (GRUB2) Kernel selection, initramfs loading, Kernel parameters grub.cfg, /boot/grub/, Kernel handoff Stage 3: Linux Kernel Initialization Hardware detection, Driver loading, initramfs unpacking vmlinuz, bzImage, Decompression, Early userspace Stage 4: initramfs (Initial RAM Filesystem) Root filesystem preparation, Module loading, Storage drivers init script, Device discovery, Root mount Stage 5: Init System (systemd/SysV) Service management, Runlevel/target, User space initialization PID 1, Service startup, Login prompt Common Boot Failure Points BIOS/UEFI: Boot order, Secure Boot GRUB: Corrupted config, Missing kernel Kernel: Panic, Module issues initramfs: Missing drivers Root FS: Corruption, Mount errors
Linux boot process architecture showing stages and common failure points

Understanding Boot Problems

System boot issues can occur at various stages of the boot process, each with distinct symptoms and solutions. Understanding where the failure occurs is crucial for effective troubleshooting.

  • BIOS/UEFI Level: Hardware detection, boot device selection, firmware issues
  • Boot Loader: GRUB errors, missing configuration, kernel selection failures
  • Kernel Initialization: Kernel panic, hardware driver issues, parameter errors
  • initramfs Stage: Missing modules, filesystem drivers, root device detection
  • Root Filesystem: Corruption, mount failures, missing /sbin/init
  • Init System: Service failures, dependency issues, configuration errors
  • Hardware Related: Disk failures, memory issues, power problems
  • Software Related: Updates, configuration changes, dependency breaks

1. BIOS/UEFI Boot Issues

BIOS/UEFI Diagnostics
dmidecode -t bios
Access firmware settings and diagnostics during POST. Hardware Critical
Boot Device Selection
efibootmgr -v
Manage UEFI boot entries and boot order configuration. UEFI Configuration
Secure Boot Issues
mokutil --sb-state
Troubleshoot Secure Boot compatibility and certificate problems. Security UEFI
Hardware Diagnostics
memtest86+
Test memory, storage, and other hardware components. Diagnostics Testing
Boot Media Issues
dd if=/dev/zero of=/dev/sdX bs=446 count=1
Fix MBR corruption and boot media problems. Storage MBR
Firmware Updates
fwupdmgr get-devices
Check and update system firmware for compatibility fixes. Firmware Updates

BIOS/UEFI Boot Error Symptoms

Error Symptom Display Message Possible Causes Severity Immediate Action
No POST Beep/Display Black screen, no beep codes Power supply, motherboard, CPU, RAM failure Critical Check power, reseat components, test with minimal config
Boot Device Not Found "No bootable device" Disk failure, wrong boot order, cable issues Critical Check BIOS boot order, verify disk connections
Invalid Boot Disk "Invalid system disk" Corrupted MBR, boot sector issues Critical Boot from recovery media, repair MBR
Secure Boot Violation "Secure Boot failed" Unsigned bootloader/kernel, wrong certificates High Disable Secure Boot temporarily or enroll keys
CMOS Checksum Error "CMOS checksum error" Dead CMOS battery, corrupted settings Medium Replace CMOS battery, reset BIOS defaults
Overclocking Failed "Overclocking failed" Unstable overclock settings Medium Reset to default BIOS settings
USB Boot Issues USB device not detected Legacy/USB boot disabled, device format Medium Enable USB boot, try different port/device
Power On
POST
BIOS/UEFI
Boot Device
Boot Loader

2. GRUB Boot Loader Issues

GRUB Error Diagnosis and Repair

# Common GRUB Error Messages and Solutions
# Error: "GRUB loading. Welcome to GRUB!" (then nothing)
# Cause: Corrupted GRUB configuration or missing stage files
# Solution: Boot from live USB and reinstall GRUB
# Boot from live USB/DVD
# Identify root partition
lsblk
fdisk -l
# Mount root partition
mkdir /mnt/root
mount /dev/sdXY /mnt/root # Replace XY with your partition
# Mount boot partition if separate
mount /dev/sdXZ /mnt/root/boot # If /boot is separate
# Mount necessary filesystems
mount --bind /dev /mnt/root/dev
mount --bind /proc /mnt/root/proc
mount --bind /sys /mnt/root/sys
# Chroot into the system
chroot /mnt/root
# Reinstall GRUB
# For BIOS systems:
grub-install /dev/sdX # Replace with your disk, not partition
update-grub
# For UEFI systems:
# First, check EFI partition
lsblk -f | grep -i efi
# Mount EFI partition if not already mounted
mount /dev/sdXN /boot/efi # Replace XN with EFI partition
# Reinstall GRUB for UEFI
grub-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=GRUB
update-grub
# Exit chroot and reboot
exit
umount -R /mnt/root
reboot
# Error: "error: file '/boot/grub/i386-pc/normal.mod' not found"
# Solution: Reinstall GRUB core image
grub-install --target=i386-pc /dev/sdX
grub-mkconfig -o /boot/grub/grub.cfg
# Error: "error: unknown filesystem"
# Solution: Install missing filesystem modules
# Check available modules
ls /usr/lib/grub/*/*.mod
# Reinstall GRUB with all modules
grub-install --modules="ext2 part_msdos" /dev/sdX
# Error: "Minimal BASH-like line editing is supported"
# You're in GRUB rescue mode
# Manual recovery:
set prefix=(hd0,gpt1)/boot/grub
set root=(hd0,gpt1)
insmod normal
normal
# After booting, fix GRUB permanently
grub-install /dev/sda
update-grub
# Error: "GRUB cannot find kernel"
# Check kernel files
ls -la /boot/vmlinuz*
ls -la /boot/initrd*
# Regenerate initramfs
update-initramfs -c -k $(uname -r)
update-grub
# GRUB Configuration Issues
# Check GRUB configuration
cat /boot/grub/grub.cfg | head -50
# Test GRUB configuration syntax
grub-mkconfig -o /boot/grub/grub.cfg.test
# If test passes, replace the config
mv /boot/grub/grub.cfg.test /boot/grub/grub.cfg
# Advanced GRUB Repair Commands
# Create emergency GRUB USB
dd if=/usr/lib/syslinux/mbr/mbr.bin of=/dev/sdX
syslinux --install /dev/sdX1
# Copy GRUB files to USB
cp -r /usr/lib/grub/i386-pc/* /mnt/usb/boot/grub/
# GRUB Environment Block Issues
# Reset GRUB environment
grub-editenv /boot/grub/grubenv create
# Set default boot entry
grub-set-default 0
# Dual-boot GRUB Issues
# Detect other OS installations
os-prober
# Update GRUB to include other OS
grub-mkconfig -o /boot/grub/grub.cfg

GRUB Boot Process Deep Dive

grub-recovery-guide.sh - Comprehensive GRUB Recovery Script
#!/bin/bash
# grub-recovery-guide.sh - Comprehensive GRUB boot loader recovery

set -euo pipefail

# Configuration
LOG_FILE="/tmp/grub-recovery-$(date +%Y%m%d-%H%M%S).log"
BACKUP_DIR="/boot/grub-backup-$(date +%Y%m%d)"
RECOVERY_MODE="${1:-auto}"  # auto, manual, minimal, uefi, bios

log() {
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a "$LOG_FILE"
}

error() {
    echo "[ERROR] $1" >&2
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] ERROR: $1" >> "$LOG_FILE"
    exit 1
}

warn() {
    echo "[WARNING] $1" >&2
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] WARNING: $1" >> "$LOG_FILE"
}

backup_grub_config() {
    log "Backing up existing GRUB configuration..."
    
    mkdir -p "$BACKUP_DIR"
    
    # Backup critical GRUB files
    cp -a /boot/grub "$BACKUP_DIR/grub" 2>/dev/null || true
    cp -a /boot/grub2 "$BACKUP_DIR/grub2" 2>/dev/null || true
    cp -a /etc/default/grub "$BACKUP_DIR/grub-default" 2>/dev/null || true
    cp -a /etc/grub.d "$BACKUP_DIR/grub.d" 2>/dev/null || true
    
    # Backup MBR and partition table
    dd if=/dev/sda of="$BACKUP_DIR/mbr-backup.bin" bs=512 count=1 2>/dev/null || true
    fdisk -l > "$BACKUP_DIR/partition-table.txt" 2>/dev/null || true
    
    log "GRUB configuration backed up to: $BACKUP_DIR"
}

detect_boot_mode() {
    log "Detecting boot mode..."
    
    # Check for EFI system partition
    if [[ -d /sys/firmware/efi ]]; then
        log "System is booted in UEFI mode"
        BOOT_MODE="uefi"
    else
        log "System is booted in BIOS/Legacy mode"
        BOOT_MODE="bios"
    fi
    
    # Detect disk layout
    detect_disk_layout
    
    echo "$BOOT_MODE"
}

detect_disk_layout() {
    log "Detecting disk layout..."
    
    # Get root partition
    ROOT_PART=$(findmnt -n -o SOURCE / 2>/dev/null || echo "")
    if [[ -z "$ROOT_PART" ]]; then
        ROOT_PART=$(mount | grep " / " | awk '{print $1}' 2>/dev/null || echo "")
    fi
    
    # Get disk device
    DISK_DEV=$(echo "$ROOT_PART" | sed 's/[0-9]*$//')
    
    # Detect EFI partition
    EFI_PART=$(findmnt -n -o SOURCE /boot/efi 2>/dev/null || \
               lsblk -o NAME,FSTYPE,MOUNTPOINT | grep -i "vfat.*/boot/efi" | awk '{print $1}' | \
               sed 's/^/\/dev\//' 2>/dev/null || echo "")
    
    # Detect /boot partition
    BOOT_PART=$(findmnt -n -o SOURCE /boot 2>/dev/null || echo "")
    
    log "Detected:"
    log "  Root partition: $ROOT_PART"
    log "  Disk device: $DISK_DEV"
    log "  EFI partition: $EFI_PART"
    log "  Boot partition: $BOOT_PART"
    
    export ROOT_PART DISK_DEV EFI_PART BOOT_PART
}

check_grub_installation() {
    log "Checking GRUB installation status..."
    
    local issues=0
    
    # Check GRUB binaries
    if ! command -v grub-install &> /dev/null; then
        warn "grub-install command not found"
        issues=$((issues + 1))
    fi
    
    if ! command -v update-grub &> /dev/null && ! command -v grub-mkconfig &> /dev/null; then
        warn "GRUB configuration tools not found"
        issues=$((issues + 1))
    fi
    
    # Check GRUB files
    if [[ ! -d /boot/grub ]] && [[ ! -d /boot/grub2 ]]; then
        warn "GRUB directory not found in /boot"
        issues=$((issues + 1))
    fi
    
    # Check kernel images
    if ! ls /boot/vmlinuz-* &> /dev/null; then
        warn "No kernel images found in /boot"
        issues=$((issues + 1))
    fi
    
    # Check initramfs
    if ! ls /boot/initrd-* &> /dev/null && ! ls /boot/initramfs-* &> /dev/null; then
        warn "No initramfs images found"
        issues=$((issues + 1))
    fi
    
    if [[ $issues -eq 0 ]]; then
        log "GRUB installation check passed"
    else
        warn "Found $issues potential issues with GRUB installation"
    fi
    
    return $issues
}

repair_bios_grub() {
    log "Repairing BIOS/Legacy GRUB installation..."
    
    local disk="$1"
    
    # Validate disk
    if [[ ! -b "$disk" ]]; then
        error "Invalid disk device: $disk"
    fi
    
    log "Installing GRUB to $disk (BIOS mode)"
    
    # Install GRUB to MBR
    if grub-install --target=i386-pc --recheck "$disk" 2>&1 | tee -a "$LOG_FILE"; then
        log "GRUB installed successfully to MBR of $disk"
    else
        error "Failed to install GRUB to $disk"
    fi
    
    # Install GRUB to partition boot sector if needed
    if [[ -n "$BOOT_PART" ]] && [[ "$BOOT_PART" != "$ROOT_PART" ]]; then
        log "Installing GRUB to boot partition: $BOOT_PART"
        grub-install --target=i386-pc --boot-directory=/boot "$BOOT_PART" 2>&1 | tee -a "$LOG_FILE"
    fi
}

repair_uefi_grub() {
    log "Repairing UEFI GRUB installation..."
    
    local disk="$1"
    local efi_part="$2"
    
    # Validate inputs
    if [[ ! -b "$disk" ]]; then
        error "Invalid disk device: $disk"
    fi
    
    if [[ ! -b "$efi_part" ]]; then
        error "Invalid EFI partition: $efi_part"
    fi
    
    # Mount EFI partition if not mounted
    if ! mountpoint -q /boot/efi; then
        log "Mounting EFI partition: $efi_part"
        mkdir -p /boot/efi
        mount "$efi_part" /boot/efi 2>&1 | tee -a "$LOG_FILE"
        EFI_MOUNTED=1
    else
        EFI_MOUNTED=0
    fi
    
    # Install GRUB for UEFI
    log "Installing GRUB to EFI system partition"
    
    if grub-install --target=x86_64-efi \
        --efi-directory=/boot/efi \
        --bootloader-id=GRUB \
        --recheck 2>&1 | tee -a "$LOG_FILE"; then
        log "GRUB installed successfully to EFI system"
    else
        # Try with different target
        warn "Standard UEFI installation failed, trying alternative..."
        grub-install --target=x86_64-efi \
            --efi-directory=/boot/efi \
            --bootloader-id=GRUB_LINUX \
            --recheck 2>&1 | tee -a "$LOG_FILE"
    fi
    
    # Create fallback boot entry
    log "Creating UEFI fallback boot entry"
    cp -r /boot/efi/EFI/GRUB /boot/efi/EFI/BOOT/ 2>/dev/null || true
    cp /boot/efi/EFI/GRUB/grubx64.efi /boot/efi/EFI/BOOT/bootx64.efi 2>/dev/null || true
    
    # Unmount if we mounted it
    if [[ $EFI_MOUNTED -eq 1 ]]; then
        umount /boot/efi 2>/dev/null || true
    fi
}

regenerate_grub_config() {
    log "Regenerating GRUB configuration..."
    
    # Update GRUB configuration
    if command -v update-grub &> /dev/null; then
        log "Running update-grub..."
        update-grub 2>&1 | tee -a "$LOG_FILE"
    elif command -v grub-mkconfig &> /dev/null; then
        log "Running grub-mkconfig..."
        grub-mkconfig -o /boot/grub/grub.cfg 2>&1 | tee -a "$LOG_FILE"
    else
        error "No GRUB configuration tool found"
    fi
    
    # Verify generated configuration
    if [[ -f /boot/grub/grub.cfg ]]; then
        local line_count=$(wc -l < /boot/grub/grub.cfg)
        log "GRUB configuration generated with $line_count lines"
        
        # Check for kernel entries
        local kernel_entries=$(grep -c "menuentry.*Linux" /boot/grub/grub.cfg || echo "0")
        log "Found $kernel_entries Linux kernel entries"
        
        if [[ $kernel_entries -eq 0 ]]; then
            warn "No Linux kernel entries found in grub.cfg"
        fi
    else
        error "GRUB configuration file not created"
    fi
}

fix_initramfs() {
    log "Checking and fixing initramfs..."
    
    # Get current kernel version
    local current_kernel=$(uname -r)
    
    # Check if initramfs exists for current kernel
    local initramfs_found=0
    for initramfs in /boot/initrd.img-* /boot/initramfs-*; do
        if [[ -f "$initramfs" ]] && echo "$initramfs" | grep -q "$current_kernel"; then
            initramfs_found=1
            break
        fi
    done
    
    if [[ $initramfs_found -eq 0 ]]; then
        warn "No initramfs found for current kernel ($current_kernel)"
        log "Regenerating initramfs..."
        
        # Regenerate initramfs
        if command -v update-initramfs &> /dev/null; then
            update-initramfs -c -k "$current_kernel" 2>&1 | tee -a "$LOG_FILE"
        elif command -v mkinitcpio &> /dev/null; then
            mkinitcpio -p linux 2>&1 | tee -a "$LOG_FILE"
        elif command -v dracut &> /dev/null; then
            dracut --force 2>&1 | tee -a "$LOG_FILE"
        else
            warn "No initramfs generation tool found"
        fi
    else
        log "initramfs found for current kernel"
    fi
    
    # Update initramfs for all kernels
    log "Updating initramfs for all installed kernels..."
    
    if command -v update-initramfs &> /dev/null; then
        update-initramfs -u -k all 2>&1 | tee -a "$LOG_FILE"
    fi
}

check_boot_filesystem() {
    log "Checking boot filesystem health..."
    
    # Check filesystem type
    local fs_type=$(findmnt -n -o FSTYPE /boot 2>/dev/null || echo "unknown")
    log "Boot filesystem type: $fs_type"
    
    # Run filesystem check
    case $fs_type in
        ext2|ext3|ext4)
            log "Running fsck on boot filesystem..."
            fsck -n "$BOOT_PART" 2>&1 | tee -a "$LOG_FILE"
            ;;
        vfat|fat32)
            log "Running dosfsck on boot filesystem..."
            dosfsck -n -v "$BOOT_PART" 2>&1 | tee -a "$LOG_FILE"
            ;;
        *)
            warn "Unknown filesystem type for boot: $fs_type"
            ;;
    esac
    
    # Check disk space
    local boot_space=$(df -h /boot | awk 'NR==2 {print $5}' | sed 's/%//')
    if [[ $boot_space -gt 90 ]]; then
        warn "Boot partition is ${boot_space}% full"
        log "Cleaning old kernels..."
        
        # Remove old kernels (keep last 3)
        if command -v apt &> /dev/null; then
            apt autoremove --purge 2>&1 | tee -a "$LOG_FILE"
        elif command -v dnf &> /dev/null; then
            package-cleanup --oldkernels --count=3 2>&1 | tee -a "$LOG_FILE"
        elif command -v yum &> /dev/null; then
            package-cleanup --oldkernels --count=3 2>&1 | tee -a "$LOG_FILE"
        fi
    else
        log "Boot partition has adequate free space"
    fi
}

create_emergency_boot_entry() {
    log "Creating emergency boot entry..."
    
    # Create emergency GRUB entry
    cat > /etc/grub.d/09_emergency << 'EOF'
#!/bin/sh
exec tail -n +3 $0

menuentry 'Emergency Boot (Single User Mode)' {
    set root='hd0,gpt1'
    linux /boot/vmlinuz-$(uname -r) root=/dev/sda1 single
    initrd /boot/initrd.img-$(uname -r)
}

menuentry 'Emergency Boot (Recovery Mode)' {
    set root='hd0,gpt1'
    linux /boot/vmlinuz-$(uname -r) root=/dev/sda1 ro recovery nomodeset
    initrd /boot/initrd.img-$(uname -r)
}

menuentry 'Emergency Boot (Previous Kernel)' {
    set root='hd0,gpt1'
    linux /boot/vmlinuz-$(ls /boot/vmlinuz-* | sort -V | tail -2 | head -1 | sed 's/.*vmlinuz-//') root=/dev/sda1 ro
    initrd /boot/initrd.img-$(ls /boot/initrd.img-* | sort -V | tail -2 | head -1 | sed 's/.*initrd.img-//')
}
EOF
    
    chmod +x /etc/grub.d/09_emergency
    
    # Regenerate GRUB config to include emergency entry
    regenerate_grub_config
    
    log "Emergency boot entries created"
}

verify_repair() {
    log "Verifying GRUB repair..."
    
    local verification_passed=1
    
    # Check GRUB installation
    if [[ "$BOOT_MODE" == "uefi" ]]; then
        if [[ -f /boot/efi/EFI/GRUB/grubx64.efi ]] || [[ -f /boot/efi/EFI/ubuntu/grubx64.efi ]]; then
            log "UEFI GRUB binary verified"
        else
            warn "UEFI GRUB binary not found"
            verification_passed=0
        fi
    else
        # Check for BIOS GRUB in MBR
        if dd if="$DISK_DEV" bs=512 count=1 2>/dev/null | grep -q "GRUB"; then
            log "BIOS GRUB in MBR verified"
        else
            warn "GRUB not found in MBR"
            verification_passed=0
        fi
    fi
    
    # Check GRUB configuration
    if [[ -f /boot/grub/grub.cfg ]] && [[ -s /boot/grub/grub.cfg ]]; then
        log "GRUB configuration file verified"
    else
        warn "GRUB configuration file missing or empty"
        verification_passed=0
    fi
    
    # Check kernel entries
    local kernel_count=$(grep -c "menuentry.*Linux" /boot/grub/grub.cfg 2>/dev/null || echo "0")
    if [[ $kernel_count -gt 0 ]]; then
        log "Found $kernel_count kernel entries in GRUB configuration"
    else
        warn "No kernel entries found in GRUB configuration"
        verification_passed=0
    fi
    
    if [[ $verification_passed -eq 1 ]]; then
        log "GRUB repair verification PASSED"
    else
        warn "GRUB repair verification FAILED - some issues remain"
    fi
    
    return $verification_passed
}

main() {
    log "Starting comprehensive GRUB recovery process..."
    
    # Initial setup
    backup_grub_config
    detect_boot_mode
    
    # Check current state
    check_grub_installation
    
    # Determine repair mode
    case "$RECOVERY_MODE" in
        auto)
            if [[ "$BOOT_MODE" == "uefi" ]]; then
                repair_uefi_grub "$DISK_DEV" "$EFI_PART"
            else
                repair_bios_grub "$DISK_DEV"
            fi
            ;;
        uefi)
            repair_uefi_grub "$DISK_DEV" "$EFI_PART"
            ;;
        bios)
            repair_bios_grub "$DISK_DEV"
            ;;
        minimal)
            # Only regenerate config
            regenerate_grub_config
            ;;
        *)
            error "Invalid recovery mode: $RECOVERY_MODE"
            ;;
    esac
    
    # Common repair steps
    regenerate_grub_config
    fix_initramfs
    check_boot_filesystem
    create_emergency_boot_entry
    
    # Final verification
    if verify_repair; then
        log "GRUB recovery completed successfully!"
        log ""
        log "Summary of actions taken:"
        log "  1. Backed up existing configuration to: $BACKUP_DIR"
        log "  2. Detected boot mode: $BOOT_MODE"
        log "  3. Repaired GRUB installation"
        log "  4. Regenerated GRUB configuration"
        log "  5. Fixed initramfs images"
        log "  6. Created emergency boot entries"
        log "  7. Verified repair completion"
        log ""
        log "Next steps:"
        log "  1. Reboot the system to test the repair"
        log "  2. If issues persist, check log file: $LOG_FILE"
        log "  3. Consider testing with boot repair disk for complex issues"
    else
        warn "GRUB recovery completed with warnings"
        log "Please check the log file for details: $LOG_FILE"
        log "Consider additional troubleshooting steps or using a boot repair disk"
    fi
}

# Run main function
main "$@"

3. Kernel Panic and Initialization Issues

Kernel Error Diagnosis

1 Kernel Panic Symptoms and Causes
# Common Kernel Panic Messages and Diagnostics

# Symptom: "Kernel panic - not syncing: VFS: Unable to mount root fs"
# Possible causes:
# - Missing or corrupted initramfs
# - Wrong root= parameter in kernel command line
# - Missing filesystem drivers in initramfs
# - Corrupted root filesystem

# Diagnostic steps:
# 1. Check kernel command line parameters
cat /proc/cmdline

# 2. Check initramfs content
lsinitramfs /boot/initrd.img-$(uname -r) | grep -E "(ext4|xfs|btrfs)"

# 3. Verify root device
blkid
ls -la /dev/disk/by-uuid/

# 4. Check filesystem integrity
fsck -n /dev/sdX

# Symptom: "Kernel panic - not syncing: Attempted to kill init"
# Possible causes:
# - Corrupted /sbin/init or systemd binary
# - Missing shared libraries
# - SELinux/AppArmor issues
# - Runlevel/target misconfiguration

# Diagnostic steps:
# 1. Check init binary
ls -la /sbin/init
file /sbin/init

# 2. Check library dependencies
ldd /sbin/init

# 3. Check SELinux status
sestatus
getenforce

# 4. Check systemd logs from previous boot
journalctl -b -1 | grep -i "systemd\|init"

# Symptom: "Kernel panic - not syncing: No working init found"
# Possible causes:
# - Missing initramfs
# - Wrong init= parameter
# - Corrupted init binary

# Diagnostic steps:
# 1. Check kernel parameters
cat /proc/cmdline | grep -o "init=[^ ]*"

# 2. Verify init binary exists
ls -la $(cat /proc/cmdline | grep -o "init=[^ ]*" | cut -d= -f2)

# 3. Check if running in container/chroot
cat /proc/1/cgroup

# Symptom: "Kernel panic - not syncing: Fatal exception"
# Possible causes:
# - Hardware failure (RAM, CPU)
# - Kernel bug
# - Driver conflict

# Diagnostic steps:
# 1. Check kernel oops messages
dmesg | grep -i "oops\|panic\|BUG"

# 2. Check hardware logs
dmidecode
lspci

# 3. Test memory
memtest86+

# 4. Check for known bugs
cat /sys/kernel/debug/kmemleak
2 Kernel Parameter Recovery
# Recovering from Kernel Parameter Issues

# Boot into recovery mode or single user mode
# Add these parameters at GRUB prompt:
# - single or 1 (single user mode)
# - init=/bin/bash (drop to shell)
# - rescue (rescue mode)
# - emergency (emergency mode)

# Common recovery kernel parameters:
# rw                  # Mount root read-write
# ro                  # Mount root read-only
# init=/bin/bash      # Use bash as init
# systemd.unit=rescue.target  # Systemd rescue target
# systemd.unit=emergency.target # Systemd emergency target
# nomodeset           # Disable kernel mode setting
# noapic              # Disable APIC
# nolapic             # Disable local APIC
# irqpoll             # Force IRQ polling
# acpi=off            # Disable ACPI
# pci=noacpi          # Disable ACPI for PCI
# debug               # Enable kernel debug
# earlyprintk=vga     # Early console output

# Editing kernel parameters in GRUB:
# 1. At GRUB menu, press 'e' to edit
# 2. Find the line starting with 'linux' or 'linuxefi'
# 3. Add parameters after the root= parameter
# 4. Press Ctrl+X or F10 to boot

# Permanent kernel parameter changes:
# Edit /etc/default/grub:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
GRUB_CMDLINE_LINUX=""

# Add parameters to GRUB_CMDLINE_LINUX:
GRUB_CMDLINE_LINUX="root=/dev/sda1 nomodeset"

# Update GRUB configuration:
update-grub
# or
grub-mkconfig -o /boot/grub/grub.cfg

# Testing kernel parameters without permanent change:
# 1. Boot with temporary parameters
# 2. If successful, make them permanent
# 3. If not, reboot without them

# Emergency kernel parameter examples:

# For graphics issues:
# Add: nomodeset i915.modeset=0 nouveau.modeset=0

# For filesystem mounting issues:
# Add: rootdelay=10 rootfstype=ext4

# For ACPI issues:
# Add: acpi=off pci=noacpi

# For USB boot issues:
# Add: usbcore.autosuspend=-1

# For RAID/LVM issues:
# Add: rd.auto rd.lvm=1 rd.md=1

# Creating emergency boot entry with safe parameters:
cat > /etc/grub.d/40_custom << 'EOF'
#!/bin/sh
exec tail -n +3 $0

menuentry 'Linux (Safe Mode)' {
    set root='hd0,gpt2'
    linux /boot/vmlinuz-$(uname -r) root=/dev/sda2 ro nomodeset noapic nolapic
    initrd /boot/initrd.img-$(uname -r)
}

menuentry 'Linux (Recovery Mode)' {
    set root='hd0,gpt2'
    linux /boot/vmlinuz-$(uname -r) root=/dev/sda2 ro single
    initrd /boot/initrd.img-$(uname -r)
}
EOF

chmod +x /etc/grub.d/40_custom
update-grub

4. Filesystem and initramfs Issues

Filesystem Corruption and Recovery

Filesystem Type Common Corruption Symptoms Recovery Commands Risk Level Preventive Measures
ext2/ext3/ext4 Superblock corruption, journal errors, inode issues fsck -y /dev/sdX, e2fsck -p -f -v Medium Regular fsck, journaling enabled, backups
XFS Metadata corruption, log mount failures xfs_repair -n, xfs_repair -L High xfs_check regularly, avoid sudden power loss
Btrfs Checksum errors, extent tree corruption btrfs check --repair, btrfs scrub Medium Regular scrubs, snapshots, RAID1 for metadata
ZFS Pool corruption, checksum errors zpool scrub, zpool clear, zfs rollback Low Regular scrubs, redundancy, snapshots
FAT32/NTFS Boot sector corruption, file allocation errors dosfsck -a -v, ntfsfix High Regular defragmentation, avoid unsafe removal
LVM Volume group metadata corruption vgcfgrestore, vgck, lvchange -an High Backup /etc/lvm/archive/, regular vgck
# initramfs Recovery and Reconstruction
# Common initramfs error: "ALERT! /dev/sda1 does not exist. Dropping to shell!"
# Boot with broken initramfs recovery:
# 1. At GRUB, edit kernel line and add: break=premount
# 2. You'll drop to initramfs shell
# 3. Manual recovery steps:
# Identify root device
blkid
ls -la /dev/disk/by-uuid/
# Load necessary modules
modprobe ext4
modprobe dm_mod # For LVM
modprobe raid1 # For RAID
# Mount root filesystem
mount /dev/sda1 /root
mount --bind /dev /root/dev
mount --bind /proc /root/proc
mount --bind /sys /root/sys
# Chroot and rebuild initramfs
chroot /root
# Rebuild initramfs for current kernel
update-initramfs -c -k $(uname -r)
# Or for all kernels
update-initramfs -u -k all
# Alternative initramfs tools by distribution:
# Debian/Ubuntu:
mkinitramfs -o /boot/initrd.img-$(uname -r) $(uname -r)
# RHEL/CentOS/Fedora:
dracut --force /boot/initramfs-$(uname -r).img $(uname -r)
# Arch Linux:
mkinitcpio -p linux
# Verify initramfs contents:
lsinitramfs /boot/initrd.img-$(uname -r) | head -20
lsinitramfs /boot/initrd.img-$(uname -r) | grep -E "(ext4|xfs|btrfs|lvm|raid)"
# Check initramfs for missing drivers:
# List modules in initramfs:
lsinitramfs /boot/initrd.img-$(uname -r) | grep ".ko$"
# Add specific modules to initramfs:
# Edit /etc/initramfs-tools/modules (Debian/Ubuntu)
echo "dm_mod" >> /etc/initramfs-tools/modules
echo "ext4" >> /etc/initramfs-tools/modules
echo "nvme" >> /etc/initramfs-tools/modules
# For dracut (RHEL/Fedora):
echo 'add_drivers+=" dm_mod ext4 nvme "' > /etc/dracut.conf.d/local.conf
# Force rebuild with new modules:
update-initramfs -u
# or
dracut --force
# Emergency boot without initramfs (if kernel has all drivers built-in):
# Edit GRUB kernel line and remove initrd entry
# Add: root=/dev/sda1 rootfstype=ext4
# Create minimal initramfs for emergency:
mkdir /tmp/mini-initramfs
cd /tmp/mini-initramfs
mkdir -p bin dev etc lib lib64 proc sys sbin usr/bin usr/sbin
cp /bin/busybox bin/
ln -s bin/busybox init
find . | cpio -H newc -o | gzip > /boot/mini-initrd.img

5. Systemd and Init System Failures

Systemd Boot Failure Diagnostics

systemd-boot-recovery.sh - Systemd Boot Failure Recovery
#!/bin/bash
# systemd-boot-recovery.sh - Systemd initialization failure recovery

set -euo pipefail

LOG_FILE="/tmp/systemd-recovery-$(date +%Y%m%d-%H%M%S).log"
RECOVERY_TARGET="${1:-emergency.target}"

log() {
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a "$LOG_FILE"
}

error() {
    echo "[ERROR] $1" >&2
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] ERROR: $1" >> "$LOG_FILE"
    exit 1
}

check_systemd_status() {
    log "Checking systemd status..."
    
    # Check if systemd is PID 1
    if [[ $(ps -p 1 -o comm=) != "systemd" ]]; then
        error "systemd is not PID 1. Current init: $(ps -p 1 -o comm=)"
    fi
    
    # Check systemd version
    systemctl --version | head -1
    
    # Check system state
    local state=$(systemctl is-system-running 2>/dev/null || echo "unknown")
    log "System state: $state"
    
    echo "$state"
}

analyze_failed_units() {
    log "Analyzing failed systemd units..."
    
    # List failed units
    local failed_count=$(systemctl --failed --no-legend | wc -l)
    
    if [[ $failed_count -gt 0 ]]; then
        log "Found $failed_count failed units:"
        systemctl --failed
        
        # Get details for each failed unit
        systemctl --failed --no-legend | while read unit rest; do
            log "Analyzing failed unit: $unit"
            systemctl status "$unit" --no-pager -l | tail -50 >> "$LOG_FILE"
            
            # Check unit dependencies
            log "Dependencies for $unit:"
            systemctl list-dependencies "$unit" --no-pager >> "$LOG_FILE"
        done
    else
        log "No failed units found"
    fi
    
    return $failed_count
}

check_journal_errors() {
    log "Checking systemd journal for errors..."
    
    # Check current boot errors
    journalctl -b -p err --no-pager | head -50 >> "$LOG_FILE"
    
    # Check previous boot if available
    if journalctl -b -1 --no-pager &>/dev/null; then
        log "Errors from previous boot:"
        journalctl -b -1 -p err --no-pager | head -20 >> "$LOG_FILE"
    fi
    
    # Check for emergency/panic messages
    journalctl -b | grep -i "emergency\|panic\|fail\|error" | tail -30 >> "$LOG_FILE"
}

recovery_target_boot() {
    local target="$1"
    
    log "Attempting to boot to recovery target: $target"
    
    # Switch to recovery target
    if systemctl isolate "$target"; then
        log "Successfully switched to $target"
        return 0
    else
        error "Failed to switch to $target"
        return 1
    fi
}

emergency_shell_recovery() {
    log "Starting emergency shell recovery..."
    
    # Check available targets
    log "Available systemd targets:"
    systemctl list-units --type=target --all --no-pager | grep target
    
    # Common recovery targets:
    # - rescue.target: Basic system with network
    # - emergency.target: Minimal shell, no services
    # - multi-user.target: Normal multi-user without GUI
    # - graphical.target: Full desktop environment
    
    # Try to reach emergency target
    log "Current default target: $(systemctl get-default)"
    
    # If system won't boot, try emergency kernel parameter:
    # Add: systemd.unit=emergency.target to kernel command line
    
    # Manual emergency boot steps:
    cat << 'EOF'

MANUAL EMERGENCY RECOVERY STEPS:
1. Reboot and edit GRUB kernel line
2. Add: systemd.unit=emergency.target
3. Or: systemd.unit=rescue.target
4. Boot to emergency shell
5. Run: systemctl default to return to normal
6. Or: systemctl reboot to restart

ALTERNATIVELY:
1. Add: init=/bin/bash to kernel line
2. Mount root: mount -o remount,rw /
3. Fix issues manually
4. Reboot: exec /sbin/init

EOF
}

fix_common_systemd_issues() {
    log "Attempting to fix common systemd issues..."
    
    local fixed_issues=0
    
    # 1. Reset failed units
    log "Resetting failed units..."
    systemctl reset-failed 2>&1 | tee -a "$LOG_FILE"
    
    # 2. Daemon reload
    log "Reloading systemd daemon..."
    systemctl daemon-reload 2>&1 | tee -a "$LOG_FILE"
    
    # 3. Check for masked units
    log "Checking for masked units that shouldn't be..."
    systemctl list-unit-files --state=masked | grep -v "static" >> "$LOG_FILE"
    
    # 4. Check systemd configuration
    log "Checking systemd configuration..."
    systemd-analyze verify --recursive-errors=no /etc/systemd/system/*.service 2>&1 | tee -a "$LOG_FILE"
    
    # 5. Check for broken symlinks
    log "Checking for broken systemd symlinks..."
    find /etc/systemd/system -type l ! -exec test -e {} \; -print 2>/dev/null >> "$LOG_FILE"
    
    # 6. Check critical services
    local critical_services=(
        systemd-journald
        systemd-udevd
        dbus
        network
        getty
    )
    
    for service in "${critical_services[@]}"; do
        if systemctl is-enabled "$service" &>/dev/null; then
            if ! systemctl is-active "$service" &>/dev/null; then
                log "Attempting to start critical service: $service"
                systemctl start "$service" 2>&1 | tee -a "$LOG_FILE" && ((fixed_issues++))
            fi
        fi
    done
    
    log "Fixed $fixed_issues issues"
    return $fixed_issues
}

check_filesystem_corruption() {
    log "Checking for filesystem corruption that affects systemd..."
    
    # Check critical system directories
    local critical_dirs=(
        /etc/systemd
        /lib/systemd
        /usr/lib/systemd
        /run/systemd
        /var/lib/systemd
    )
    
    for dir in "${critical_dirs[@]}"; do
        if [[ -d "$dir" ]]; then
            log "Checking directory: $dir"
            
            # Check for permissions
            find "$dir" -type f ! -perm 644 -o -type d ! -perm 755 2>/dev/null | \
                head -5 >> "$LOG_FILE"
            
            # Check for broken symlinks
            find "$dir" -type l ! -exec test -e {} \; -print 2>/dev/null | \
                head -5 >> "$LOG_FILE"
        else
            warn "Missing directory: $dir"
        fi
    done
    
    # Check critical binaries
    local critical_bins=(
        /lib/systemd/systemd
        /usr/lib/systemd/systemd
        /bin/systemctl
        /usr/bin/systemctl
        /sbin/init
    )
    
    for bin in "${critical_bins[@]}"; do
        if [[ -f "$bin" ]]; then
            log "Checking binary: $bin"
            file "$bin" >> "$LOG_FILE"
            ldd "$bin" 2>/dev/null | grep -i "not found" && \
                warn "Missing libraries for $bin"
        else
            warn "Missing binary: $bin"
        fi
    done
}

recover_journal_corruption() {
    log "Checking for journal corruption..."
    
    # Check journal status
    journalctl --verify 2>&1 | tee -a "$LOG_FILE"
    
    # If journal is corrupted
    if journalctl --verify 2>&1 | grep -q "corrupt"; then
        warn "Journal corruption detected"
        
        # Backup existing journal
        local backup_dir="/var/log/journal-backup-$(date +%Y%m%d)"
        mkdir -p "$backup_dir"
        cp -a /var/log/journal/* "$backup_dir/" 2>/dev/null || true
        
        # Clear corrupted journal
        log "Clearing corrupted journal..."
        journalctl --rotate
        journalctl --vacuum-time=1s
        
        # Restart journal daemon
        systemctl restart systemd-journald 2>&1 | tee -a "$LOG_FILE"
        
        log "Journal recovery attempted. Backup at: $backup_dir"
    else
        log "Journal integrity check passed"
    fi
}

create_emergency_service() {
    log "Creating emergency recovery service..."
    
    # Create emergency service that runs on boot failure
    cat > /etc/systemd/system/emergency-recovery.service << 'EOF'
[Unit]
Description=Emergency System Recovery Service
DefaultDependencies=no
Conflicts=shutdown.target
Before=shutdown.target
OnFailure=emergency.target

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/local/bin/emergency-recovery.sh
TimeoutSec=0
StandardOutput=journal+console
StandardError=journal+console

[Install]
WantedBy=multi-user.target
EOF

    # Create recovery script
    cat > /usr/local/bin/emergency-recovery.sh << 'EOF'
#!/bin/bash
# Emergency recovery script

set -euo pipefail

LOG_FILE="/var/log/emergency-recovery.log"

log() {
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" >> "$LOG_FILE"
}

# Basic system checks
log "Starting emergency recovery..."

# Check disk space
df -h >> "$LOG_FILE"

# Check memory
free -h >> "$LOG_FILE"

# Check for failed units
systemctl --failed >> "$LOG_FILE" 2>&1 || true

# Attempt to fix common issues
systemctl daemon-reload >> "$LOG_FILE" 2>&1
systemctl reset-failed >> "$LOG_FILE" 2>&1

# Start critical services
for service in systemd-journald dbus systemd-udevd; do
    if ! systemctl is-active "$service" &>/dev/null; then
        log "Starting $service"
        systemctl start "$service" >> "$LOG_FILE" 2>&1 || true
    fi
done

log "Emergency recovery completed"
EOF

    chmod +x /usr/local/bin/emergency-recovery.sh
    
    # Enable the service
    systemctl enable emergency-recovery.service 2>&1 | tee -a "$LOG_FILE"
    
    log "Emergency recovery service created and enabled"
}

main() {
    log "Starting systemd boot failure recovery..."
    
    # Initial checks
    local system_state=$(check_systemd_status)
    
    # Analyze current state
    analyze_failed_units
    check_journal_errors
    check_filesystem_corruption
    
    # Attempt recovery
    fix_common_systemd_issues
    recover_journal_corruption
    
    # If still having issues, try recovery target
    if [[ "$system_state" != "running" ]] && [[ "$system_state" != "degraded" ]]; then
        log "System not in running state, attempting recovery target..."
        recovery_target_boot "$RECOVERY_TARGET" || emergency_shell_recovery
    fi
    
    # Create preventive measures
    create_emergency_service
    
    log "Systemd recovery process completed"
    log ""
    log "Summary:"
    log "  - System state: $system_state"
    log "  - Recovery target attempted: $RECOVERY_TARGET"
    log "  - Log file: $LOG_FILE"
    log ""
    log "Next steps:"
    log "  1. Review the log file for detailed analysis"
    log "  2. If problems persist, consider:"
    log "     - Boot with older kernel"
    log "     - Boot to recovery mode"
    log "     - Use live CD to chroot and repair"
    log "     - Restore from backup if available"
    log "  3. Test the emergency recovery service"
    log "  4. Consider setting up automatic backups"
}

# Run main function
main "$@"

6. Hardware-Related Boot Issues

Critical Hardware Boot Failure Symptoms:

1. Memory (RAM) Failures:
  • System hangs during POST
  • Random kernel panics with different error messages
  • Data corruption and filesystem errors
  • Failed memory tests during boot
2. Storage Device Failures:
  • Disk not detected in BIOS/UEFI
  • Slow boot times with disk errors
  • SMART errors showing in dmesg
  • Filesystem corruption after reboot
3. Power Supply Issues:
  • System won't power on at all
  • Random reboots during boot process
  • Inconsistent boot success
  • Peripheral devices not working
4. Motherboard/CPU Problems:
  • No POST, no beep codes
  • Overheating during boot
  • Inconsistent hardware detection
  • BIOS/UEFI resetting to defaults
Immediate Diagnostic Steps: 1. Run hardware diagnostics from BIOS/UEFI
2. Test with minimal hardware configuration
3. Check all cable connections
4. Monitor temperatures during boot
5. Listen for abnormal sounds (clicking disks, beep codes)

Hardware Diagnostics and Recovery

# Comprehensive Hardware Boot Diagnostics
# 1. Memory Testing
# Boot with memtest86+ from GRUB or USB
# In GRUB, add memtest86+ entry:
cat > /etc/grub.d/20_memtest86+ << 'EOF'
#!/bin/sh
exec tail -n +3 \$0
menuentry "Memory test (memtest86+)" {
linux16 /boot/memtest86+.bin
}
EOF
chmod +x /etc/grub.d/20_memtest86+
update-grub
# Quick memory test without rebooting:
sudo apt-get install memtester # Debian/Ubuntu
sudo yum install memtester # RHEL/CentOS
memtester 500M 1 # Test 500MB for 1 iteration
# 2. Storage Device Diagnostics
# Check SMART status:
sudo apt-get install smartmontools
sudo smartctl -a /dev/sda
sudo smartctl -t short /dev/sda # Run short test
sudo smartctl -t long /dev/sda # Run long test
# Check for bad sectors:
sudo badblocks -v /dev/sda > bad-sectors.txt
sudo fsck -c /dev/sda1 # Check for bad blocks during fsck
# Disk performance testing:
sudo hdparm -Tt /dev/sda
sudo dd if=/dev/zero of=/tmp/test bs=1M count=1024 conv=fdatasync
# 3. CPU and Temperature Monitoring
sudo apt-get install lm-sensors
sudo sensors-detect
sudo sensors
# Check CPU frequency and throttling:
cat /proc/cpuinfo | grep -i mhz
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
cat /sys/devices/system/cpu/cpu*/thermal_throttle/*
# Stress test CPU:
sudo apt-get install stress
stress --cpu 4 --timeout 30s
# 4. Power Supply Testing
# Check voltage readings:
sudo apt-get install powertop
sudo powertop
# Check power states:
cat /sys/power/state
cat /proc/acpi/battery/*/state
# 5. Motherboard and Firmware Diagnostics
# Dump BIOS/UEFI information:
sudo dmidecode -t bios
sudo dmidecode -t baseboard
sudo dmidecode -t chassis
# Check kernel ring buffer for hardware errors:
dmesg | grep -i "error\|fail\|warn\|hardware"
dmesg | grep -i "mce\|machine check"
# 6. Peripheral Device Diagnostics
# List all PCI devices:
lspci -vvv
lspci -t
# List USB devices:
lsusb -vvv
lsusb -t
# Check kernel modules for hardware:
lsmod | grep -i "ahci\|sata\|nvme\|usb"
# 7. Boot-Time Hardware Diagnostics
# Add kernel parameters for hardware debugging:
# Add to GRUB_CMDLINE_LINUX in /etc/default/grub:
# debug # Enable kernel debugging
# earlyprintk=vga,keep # Early console output
# log_buf_len=16M # Larger log buffer
# ignore_loglevel # Print all messages
# mminit_loglevel=4 # Memory management debug
# 8. Creating Hardware Test Boot Media
# Create bootable diagnostic USB:
# Download SystemRescueCD or Ultimate Boot CD
# Use dd to write ISO to USB:
sudo dd if=systemrescuecd-x86-6.0.3.iso of=/dev/sdX bs=4M status=progress
sync
# 9. Automated Hardware Health Check Script
cat > /usr/local/bin/hardware-check.sh << 'EOF'
#!/bin/bash
echo "=== Hardware Health Check ==="
echo "Date: $(date)"
echo
echo "1. Memory:"
free -h
echo
echo "2. CPU Temperature:"
sensors 2>/dev/null || echo "lm-sensors not installed"
echo
echo "3. Disk SMART Status:"
for disk in /dev/sd[a-z]; do
[ -b "$disk" ] && sudo smartctl -H "$disk" | grep -i "test\|result"
done
echo
echo "4. Recent Hardware Errors:"
dmesg | tail -20 | grep -i "error\|fail"
EOF
chmod +x /usr/local/bin/hardware-check.sh

7. Boot Recovery Tools and Techniques

Boot Recovery Tool Comparison

Recovery Tool Primary Use Case Boot Method Key Features Complexity
SystemRescueCD General system recovery Live CD/USB Filesystem tools, hardware tests, network recovery Medium
Super Grub2 Disk GRUB repair and boot issues Live CD/USB GRUB recovery, boot sector repair, disk editing Low-Medium
Boot-Repair-Disk Automatic boot repair Live CD/USB One-click GRUB repair, UEFI/BIOS support Low
GParted Live Partition recovery Live CD/USB Partition editing, filesystem resizing, data recovery Medium
TRK (Trinity Rescue Kit) Password reset, virus scan Live CD/USB Password reset, virus cleaning, network tools Medium
Clonezilla Live Disk imaging and cloning Live CD/USB Disk backup, restore, cloning, bare metal recovery Medium-High
Linux Mint Live General troubleshooting Live CD/USB Full desktop, driver support, package management Low
Knoppix Hardware detection Live CD/USB Excellent hardware detection, recovery tools Low-Medium
Boot Recovery Best Practices:

Before Problems Occur: 1. Regular backups: Maintain system image backups
2. Boot media: Create and test recovery USB drives
3. Documentation: Document system configuration
4. Testing: Test recovery procedures regularly
5. Monitoring: Monitor disk health and system logs

During Recovery: 1. Don't panic: Methodical troubleshooting is key
2. Document changes: Keep notes of all changes made
3. Test incrementally: Make one change at a time and test
4. Use safe mode: Boot with minimal configuration first
5. Backup first: Always backup before making changes

After Recovery: 1. Root cause analysis: Identify what caused the problem
2. Preventive measures: Implement fixes to prevent recurrence
3. Update documentation: Update recovery procedures
4. Test backups: Verify backup integrity
5. Monitor closely: Watch for similar issues after recovery

Comprehensive Boot Recovery Script

boot-recovery-master.sh - Master Boot Recovery Script
#!/bin/bash
# boot-recovery-master.sh - Master boot recovery and troubleshooting script

set -euo pipefail

# Configuration
RECOVERY_MODE="${1:-diagnose}"
BACKUP_DIR="/boot/recovery-backup-$(date +%Y%m%d-%H%M%S)"
LOG_FILE="/tmp/boot-recovery-$(date +%Y%m%d-%H%M%S).log"
LIVE_MODE=false

# Colors for output (removed for professional look)
log() {
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a "$LOG_FILE"
}

error() {
    echo "[ERROR] $1" >&2
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] ERROR: $1" >> "$LOG_FILE"
    exit 1
}

warn() {
    echo "[WARNING] $1" >&2
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] WARNING: $1" >> "$LOG_FILE"
}

check_live_mode() {
    # Check if running from live media
    if [[ -f /etc/live/config.conf ]] || [[ -d /live ]] || mount | grep -q "aufs"; then
        LIVE_MODE=true
        log "Running in Live USB/CD mode"
    else
        LIVE_MODE=false
        log "Running on installed system"
    fi
}

setup_environment() {
    log "Setting up recovery environment..."
    
    # Create backup directory
    mkdir -p "$BACKUP_DIR"
    
    # Create working directories
    mkdir -p /tmp/recovery/{boot,root,efi}
    
    # Check for necessary tools
    local missing_tools=""
    for tool in lsblk fdisk grep awk sed mount umount chroot; do
        if ! command -v "$tool" &> /dev/null; then
            missing_tools="$missing_tools $tool"
        fi
    done
    
    if [[ -n "$missing_tools" ]]; then
        warn "Missing tools:$missing_tools"
    fi
}

backup_system_state() {
    log "Backing up current system state..."
    
    # Backup partition table
    fdisk -l > "$BACKUP_DIR/partition-table.txt" 2>/dev/null || true
    
    # Backup boot sector
    dd if=/dev/sda of="$BACKUP_DIR/mbr-backup.bin" bs=512 count=1 2>/dev/null || true
    
    # Backup GRUB configuration
    cp -a /boot/grub "$BACKUP_DIR/grub" 2>/dev/null || true
    cp -a /boot/grub2 "$BACKUP_DIR/grub2" 2>/dev/null || true
    cp /etc/default/grub "$BACKUP_DIR/grub-default" 2>/dev/null || true
    
    # Backup kernel and initramfs
    mkdir -p "$BACKUP_DIR/kernels"
    cp /boot/vmlinuz-* "$BACKUP_DIR/kernels/" 2>/dev/null || true
    cp /boot/initrd.img-* "$BACKUP_DIR/kernels/" 2>/dev/null || true
    cp /boot/initramfs-* "$BACKUP_DIR/kernels/" 2>/dev/null || true
    
    # Backup fstab
    cp /etc/fstab "$BACKUP_DIR/fstab" 2>/dev/null || true
    
    log "System state backed up to: $BACKUP_DIR"
}

analyze_boot_environment() {
    log "Analyzing boot environment..."
    
    cat > "$BACKUP_DIR/boot-analysis.txt" << 'EOF'
BOOT ENVIRONMENT ANALYSIS
=========================
EOF

    # 1. Detect boot mode
    if [[ -d /sys/firmware/efi ]]; then
        echo "Boot mode: UEFI" >> "$BACKUP_DIR/boot-analysis.txt"
        BOOT_MODE="uefi"
    else
        echo "Boot mode: BIOS/Legacy" >> "$BACKUP_DIR/boot-analysis.txt"
        BOOT_MODE="bios"
    fi
    
    # 2. Detect disks and partitions
    echo "" >> "$BACKUP_DIR/boot-analysis.txt"
    echo "DISK LAYOUT:" >> "$BACKUP_DIR/boot-analysis.txt"
    lsblk -o NAME,SIZE,FSTYPE,MOUNTPOINT,LABEL,UUID >> "$BACKUP_DIR/boot-analysis.txt"
    
    # 3. Check mounted filesystems
    echo "" >> "$BACKUP_DIR/boot-analysis.txt"
    echo "MOUNTED FILESYSTEMS:" >> "$BACKUP_DIR/boot-analysis.txt"
    mount | grep "^/dev/" >> "$BACKUP_DIR/boot-analysis.txt"
    
    # 4. Check GRUB installation
    echo "" >> "$BACKUP_DIR/boot-analysis.txt"
    echo "GRUB INSTALLATION:" >> "$BACKUP_DIR/boot-analysis.txt"
    if command -v grub-install &> /dev/null; then
        grub-install --version >> "$BACKUP_DIR/boot-analysis.txt"
    else
        echo "grub-install not found" >> "$BACKUP_DIR/boot-analysis.txt"
    fi
    
    # 5. Check kernel versions
    echo "" >> "$BACKUP_DIR/boot-analysis.txt"
    echo "INSTALLED KERNELS:" >> "$BACKUP_DIR/boot-analysis.txt"
    ls -la /boot/vmlinuz-* 2>/dev/null >> "$BACKUP_DIR/boot-analysis.txt" || true
    
    # 6. Check initramfs
    echo "" >> "$BACKUP_DIR/boot-analysis.txt"
    echo "INITRAMFS IMAGES:" >> "$BACKUP_DIR/boot-analysis.txt"
    ls -la /boot/initrd.img-* /boot/initramfs-* 2>/dev/null >> "$BACKUP_DIR/boot-analysis.txt" || true
    
    log "Boot analysis saved to: $BACKUP_DIR/boot-analysis.txt"
}

diagnose_boot_issues() {
    log "Diagnosing boot issues..."
    
    local issues_found=0
    cat > "$BACKUP_DIR/diagnosis.txt" << 'EOF'
BOOT ISSUE DIAGNOSIS
====================
EOF

    # Check 1: Disk space on boot partition
    if df -h /boot 2>/dev/null | tail -1 | awk '{print $5}' | grep -q "[0-9][0-9]%"; then
        echo "ISSUE: Boot partition nearly full" >> "$BACKUP_DIR/diagnosis.txt"
        df -h /boot >> "$BACKUP_DIR/diagnosis.txt"
        ((issues_found++))
    fi
    
    # Check 2: Missing kernel images
    if ! ls /boot/vmlinuz-* &> /dev/null; then
        echo "ISSUE: No kernel images found in /boot" >> "$BACKUP_DIR/diagnosis.txt"
        ((issues_found++))
    fi
    
    # Check 3: Missing initramfs
    if ! ls /boot/initrd.img-* /boot/initramfs-* &> /dev/null; then
        echo "ISSUE: No initramfs images found" >> "$BACKUP_DIR/diagnosis.txt"
        ((issues_found++))
    fi
    
    # Check 4: GRUB configuration issues
    if [[ -f /boot/grub/grub.cfg ]]; then
        if ! grep -q "menuentry.*Linux" /boot/grub/grub.cfg; then
            echo "ISSUE: No Linux menuentries in grub.cfg" >> "$BACKUP_DIR/diagnosis.txt"
            ((issues_found++))
        fi
    else
        echo "ISSUE: grub.cfg not found" >> "$BACKUP_DIR/diagnosis.txt"
        ((issues_found++))
    fi
    
    # Check 5: Filesystem errors
    echo "" >> "$BACKUP_DIR/diagnosis.txt"
    echo "FILESYSTEM CHECKS:" >> "$BACKUP_DIR/diagnosis.txt"
    for mount_point in / /boot; do
        if device=$(findmnt -n -o SOURCE "$mount_point" 2>/dev/null); then
            echo "Checking $mount_point ($device)..." >> "$BACKUP_DIR/diagnosis.txt"
            fsck -n "$device" 2>&1 | grep -i "error\|corrupt" >> "$BACKUP_DIR/diagnosis.txt" || true
        fi
    done
    
    # Check 6: Hardware issues in logs
    echo "" >> "$BACKUP_DIR/diagnosis.txt"
    echo "HARDWARE ERRORS IN LOGS:" >> "$BACKUP_DIR/diagnosis.txt"
    dmesg | tail -50 | grep -i "error\|fail\|timeout" >> "$BACKUP_DIR/diagnosis.txt" || true
    
    if [[ $issues_found -eq 0 ]]; then
        echo "No obvious boot issues detected in automated checks." >> "$BACKUP_DIR/diagnosis.txt"
        echo "Check manual analysis for subtle issues." >> "$BACKUP_DIR/diagnosis.txt"
    else
        echo "" >> "$BACKUP_DIR/diagnosis.txt"
        echo "Total issues detected: $issues_found" >> "$BACKUP_DIR/diagnosis.txt"
    fi
    
    log "Diagnosis complete. Issues found: $issues_found"
    log "Diagnosis report: $BACKUP_DIR/diagnosis.txt"
}

repair_grub() {
    log "Starting GRUB repair..."
    
    local repair_log="$BACKUP_DIR/grub-repair.log"
    
    # Detect disks and partitions
    local root_part=$(findmnt -n -o SOURCE /)
    local boot_part=$(findmnt -n -o SOURCE /boot)
    local efi_part=$(findmnt -n -o SOURCE /boot/efi)
    local disk=$(echo "$root_part" | sed 's/[0-9]*$//')
    
    log "Detected:"
    log "  Root: $root_part"
    log "  Boot: $boot_part"
    log "  EFI: $efi_part"
    log "  Disk: $disk"
    
    # Backup current GRUB
    log "Backing up current GRUB configuration..."
    grub-mkconfig -o "$BACKUP_DIR/grub.cfg.backup" 2>&1 >> "$repair_log" || true
    
    # Repair based on boot mode
    if [[ "$BOOT_MODE" == "uefi" ]]; then
        repair_grub_uefi "$disk" "$efi_part"
    else
        repair_grub_bios "$disk"
    fi
    
    # Regenerate GRUB configuration
    log "Regenerating GRUB configuration..."
    update-grub 2>&1 >> "$repair_log" || grub-mkconfig -o /boot/grub/grub.cfg 2>&1 >> "$repair_log"
    
    # Verify repair
    log "Verifying GRUB repair..."
    if [[ -f /boot/grub/grub.cfg ]] && grep -q "menuentry.*Linux" /boot/grub/grub.cfg; then
        log "GRUB repair appears successful"
    else
        warn "GRUB repair may have issues"
    fi
    
    log "GRUB repair log: $repair_log"
}

repair_grub_uefi() {
    local disk="$1"
    local efi_part="$2"
    
    log "Repairing UEFI GRUB on disk $disk, EFI partition $efi_part"
    
    # Mount EFI partition if needed
    local efi_mounted=false
    if ! mountpoint -q /boot/efi; then
        mkdir -p /boot/efi
        mount "$efi_part" /boot/efi 2>&1 >> "$repair_log"
        efi_mounted=true
    fi
    
    # Install GRUB for UEFI
    grub-install --target=x86_64-efi \
        --efi-directory=/boot/efi \
        --bootloader-id=GRUB \
        --recheck 2>&1 >> "$repair_log" || \
    grub-install --target=x86_64-efi \
        --efi-directory=/boot/efi \
        --bootloader-id=GRUB_LINUX \
        --recheck 2>&1 >> "$repair_log"
    
    # Create fallback
    mkdir -p /boot/efi/EFI/BOOT
    cp /boot/efi/EFI/GRUB/grubx64.efi /boot/efi/EFI/BOOT/bootx64.efi 2>/dev/null || true
    
    # Unmount if we mounted it
    if [[ "$efi_mounted" == true ]]; then
        umount /boot/efi 2>/dev/null || true
    fi
}

repair_grub_bios() {
    local disk="$1"
    
    log "Repairing BIOS GRUB on disk $disk"
    
    # Install GRUB to MBR
    grub-install --target=i386-pc --recheck "$disk" 2>&1 >> "$repair_log"
    
    # Also install to partition if /boot is separate
    if [[ -n "$boot_part" ]] && [[ "$boot_part" != "$root_part" ]]; then
        grub-install --target=i386-pc --boot-directory=/boot "$boot_part" 2>&1 >> "$repair_log"
    fi
}

repair_initramfs() {
    log "Repairing initramfs..."
    
    local repair_log="$BACKUP_DIR/initramfs-repair.log"
    
    # Get current kernel
    local current_kernel=$(uname -r)
    
    # Check initramfs tools
    local initramfs_tool=""
    if command -v update-initramfs &> /dev/null; then
        initramfs_tool="update-initramfs"
    elif command -v mkinitcpio &> /dev/null; then
        initramfs_tool="mkinitcpio"
    elif command -v dracut &> /dev/null; then
        initramfs_tool="dracut"
    fi
    
    if [[ -z "$initramfs_tool" ]]; then
        warn "No initramfs tool found"
        return 1
    fi
    
    log "Using initramfs tool: $initramfs_tool"
    
    # Repair initramfs for current kernel
    case "$initramfs_tool" in
        update-initramfs)
            update-initramfs -c -k "$current_kernel" 2>&1 >> "$repair_log"
            update-initramfs -u -k all 2>&1 >> "$repair_log"
            ;;
        mkinitcpio)
            mkinitcpio -p linux 2>&1 >> "$repair_log"
            ;;
        dracut)
            dracut --force 2>&1 >> "$repair_log"
            ;;
    esac
    
    # Verify repair
    if ls /boot/initrd.img-"$current_kernel" /boot/initramfs-"$current_kernel".img 2>/dev/null | grep -q .; then
        log "initramfs repair successful"
    else
        warn "initramfs repair may have failed"
    fi
    
    log "initramfs repair log: $repair_log"
}

repair_filesystem() {
    log "Checking and repairing filesystems..."
    
    local fsck_log="$BACKUP_DIR/fsck-repair.log"
    
    # Check root filesystem
    log "Checking root filesystem..."
    fsck -y "$root_part" 2>&1 >> "$fsck_log" || true
    
    # Check boot filesystem if separate
    if [[ -n "$boot_part" ]] && [[ "$boot_part" != "$root_part" ]]; then
        log "Checking boot filesystem..."
        fsck -y "$boot_part" 2>&1 >> "$fsck_log" || true
    fi
    
    # Check for errors
    if grep -q "FILE SYSTEM WAS MODIFIED" "$fsck_log"; then
        log "Filesystem repairs were made"
    else
        log "No filesystem repairs needed"
    fi
    
    log "Filesystem repair log: $fsck_log"
}

create_recovery_entries() {
    log "Creating recovery boot entries..."
    
    # Create emergency GRUB entries
    cat > /etc/grub.d/09_recovery << 'EOF'
#!/bin/sh
exec tail -n +3 $0

menuentry 'Linux (Recovery Mode) -- System Recovery' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-recovery' {
    recordfail
    load_video
    insmod gzio
    if [ x$grub_platform = xxen ]; then insmod xzio; insmod lzopio; fi
    insmod part_gpt
    insmod ext2
    set root='hd0,gpt2'
    if [ x$feature_platform_search_hint = xy ]; then
        search --no-floppy --fs-uuid --set=root --hint-bios=hd0,gpt2 --hint-efi=hd0,gpt2 --hint-baremetal=ahci0,gpt2  $ROOT_UUID
    else
        search --no-floppy --fs-uuid --set=root $ROOT_UUID
    fi
    echo    'Loading Linux ...'
    linux   /boot/vmlinuz-$(uname -r) root=UUID=$ROOT_UUID ro recovery nomodeset
    echo    'Loading initial ramdisk ...'
    initrd  /boot/initrd.img-$(uname -r)
}

menuentry 'Linux (Safe Graphics Mode) -- Troubleshooting' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-safegraphics' {
    recordfail
    load_video
    insmod gzio
    if [ x$grub_platform = xxen ]; then insmod xzio; insmod lzopio; fi
    insmod part_gpt
    insmod ext2
    set root='hd0,gpt2'
    if [ x$feature_platform_search_hint = xy ]; then
        search --no-floppy --fs-uuid --set=root --hint-bios=hd0,gpt2 --hint-efi=hd0,gpt2 --hint-baremetal=ahci0,gpt2  $ROOT_UUID
    else
        search --no-floppy --fs-uuid --set=root $ROOT_UUID
    fi
    echo    'Loading Linux ...'
    linux   /boot/vmlinuz-$(uname -r) root=UUID=$ROOT_UUID ro quiet splash nomodeset
    echo    'Loading initial ramdisk ...'
    initrd  /boot/initrd.img-$(uname -r)
}

menuentry 'Linux (Previous Kernel) -- Fallback' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-previous' {
    recordfail
    load_video
    insmod gzio
    if [ x$grub_platform = xxen ]; then insmod xzio; insmod lzopio; fi
    insmod part_gpt
    insmod ext2
    set root='hd0,gpt2'
    if [ x$feature_platform_search_hint = xy ]; then
        search --no-floppy --fs-uuid --set=root --hint-bios=hd0,gpt2 --hint-efi=hd0,gpt2 --hint-baremetal=ahci0,gpt2  $ROOT_UUID
    else
        search --no-floppy --fs-uuid --set=root $ROOT_UUID
    fi
    echo    'Loading Linux ...'
    linux   /boot/vmlinuz-$(ls /boot/vmlinuz-* | sort -V | tail -2 | head -1 | sed 's/.*vmlinuz-//') root=UUID=$ROOT_UUID ro quiet splash
    echo    'Loading initial ramdisk ...'
    initrd  /boot/initrd.img-$(ls /boot/initrd.img-* | sort -V | tail -2 | head -1 | sed 's/.*initrd.img-//')
}
EOF

    chmod +x /etc/grub.d/09_recovery
    
    # Update GRUB to include recovery entries
    update-grub 2>&1 | tee -a "$LOG_FILE"
    
    log "Recovery boot entries created"
}

generate_recovery_report() {
    log "Generating recovery report..."
    
    cat > "$BACKUP_DIR/recovery-report.txt" << EOF
BOOT RECOVERY REPORT
====================
Recovery performed: $(date)
Recovery mode: $RECOVERY_MODE
Boot mode: $BOOT_MODE

BACKUP INFORMATION:
Backup directory: $BACKUP_DIR
Log file: $LOG_FILE

SYSTEM INFORMATION:
Kernel: $(uname -r)
Distribution: $(lsb_release -d 2>/dev/null | cut -f2 || cat /etc/os-release | grep PRETTY_NAME | cut -d= -f2 | tr -d '"')

RECOVERY ACTIONS PERFORMED:
1. System state backed up
2. Boot environment analyzed
3. Boot issues diagnosed
4. GRUB repaired
5. initramfs repaired
6. Filesystem checked
7. Recovery boot entries created

FILES BACKED UP:
$(find "$BACKUP_DIR" -type f | sed 's|.*/||')

NEXT STEPS:
1. Review the log file: $LOG_FILE
2. Review backup directory: $BACKUP_DIR
3. Reboot to test recovery
4. If issues persist, check diagnosis reports
5. Consider using boot repair live media for complex issues

IMPORTANT FILES:
- Boot analysis: $BACKUP_DIR/boot-analysis.txt
- Diagnosis: $BACKUP_DIR/diagnosis.txt
- GRUB repair log: $BACKUP_DIR/grub-repair.log
- initramfs repair log: $BACKUP_DIR/initramfs-repair.log

RECOVERY TOOLS AVAILABLE:
- SystemRescueCD: https://www.system-rescue.org/
- Super Grub2 Disk: http://www.supergrubdisk.org/
- Boot-Repair-Disk: https://sourceforge.net/projects/boot-repair-cd/

EOF

    log "Recovery report generated: $BACKUP_DIR/recovery-report.txt"
}

main() {
    log "Starting comprehensive boot recovery process..."
    
    # Initial setup
    check_live_mode
    setup_environment
    backup_system_state
    analyze_boot_environment
    
    # Execute based on recovery mode
    case "$RECOVERY_MODE" in
        diagnose)
            diagnose_boot_issues
            ;;
        repair)
            diagnose_boot_issues
            repair_grub
            repair_initramfs
            repair_filesystem
            create_recovery_entries
            ;;
        minimal)
            repair_grub
            ;;
        full)
            diagnose_boot_issues
            repair_grub
            repair_initramfs
            repair_filesystem
            create_recovery_entries
            ;;
        report-only)
            diagnose_boot_issues
            ;;
        *)
            error "Unknown recovery mode: $RECOVERY_MODE"
            ;;
    esac
    
    # Generate final report
    generate_recovery_report
    
    log "Boot recovery process completed!"
    log ""
    log "=== RECOVERY SUMMARY ==="
    log "Mode: $RECOVERY_MODE"
    log "Backup location: $BACKUP_DIR"
    log "Log file: $LOG_FILE"
    log "Report: $BACKUP_DIR/recovery-report.txt"
    log ""
    log "Next steps:"
    log "1. Review the recovery report"
    log "2. Reboot to test the fixes"
    log "3. If problems persist, use the diagnosis reports"
    log "4. Consider professional help for hardware issues"
}

# Error handling
trap 'log "Recovery interrupted by user"; exit 1' INT
trap 'log "Script terminated unexpectedly"; exit 1' TERM

# Run main function
main "$@"

Boot Issue Resolution Flowchart

Boot Issue Resolution Flowchart START: System Won't Boot STEP 1: Initial Assessment Check power, display, BIOS/UEFI POST BIOS/UEFI Issues No POST, boot device not found Check hardware, reset BIOS GRUB Issues GRUB prompt, error messages Boot from live media, repair GRUB Kernel Panic Kernel panic messages Check parameters, hardware initramfs Issues Dropping to initramfs shell Rebuild initramfs, check modules Filesystem Issues Mount errors, corruption Run fsck, check disk health Systemd/Init Issues Services fail, no login Check logs, rescue target STEP 2: Recovery Tools Use appropriate recovery tools based on issue Live USB, recovery mode, single user Software Repair GRUB reinstall initramfs rebuild fsck repair Hardware Repair Memory testing Disk replacement Cable checks STEP 3: Verification & Prevention Test boot, create backups, monitor health Regular maintenance, health checks END: System Boots
Boot issue resolution flowchart showing systematic troubleshooting approach

Mastering Boot Issue Troubleshooting

System boot problems can be daunting, but with a systematic approach and proper tools, most issues can be diagnosed and resolved. Understanding the boot process, recognizing symptoms, and applying targeted fixes are key skills for any system administrator.

Key Takeaways: The boot process follows distinct stages: BIOS/UEFI, boot loader, kernel initialization, initramfs, and init system. Each stage has characteristic failure modes and recovery techniques. GRUB issues are common but usually repairable with live media. Kernel panics require parameter adjustments or hardware testing. Filesystem corruption needs careful fsck operations. Systemd failures benefit from target-based recovery. Hardware issues require diagnostic testing and potential replacement.

Next Steps: Practice recovery procedures in a safe environment. Create and test boot recovery media. Implement monitoring for early detection of boot-related issues. Establish backup and recovery procedures. Document system configurations and recovery steps. Stay updated on boot loader and kernel developments. Consider automated health checks and preventive maintenance.