Master Linux boot troubleshooting with this comprehensive guide covering GRUB errors, kernel panics, filesystem corruption, hardware failures, and recovery techniques for diagnosing and fixing system boot problems.
Understanding Boot Problems
System boot issues can occur at various stages of the boot process, each with distinct symptoms and solutions. Understanding where the failure occurs is crucial for effective troubleshooting.
- BIOS/UEFI Level: Hardware detection, boot device selection, firmware issues
- Boot Loader: GRUB errors, missing configuration, kernel selection failures
- Kernel Initialization: Kernel panic, hardware driver issues, parameter errors
- initramfs Stage: Missing modules, filesystem drivers, root device detection
- Root Filesystem: Corruption, mount failures, missing /sbin/init
- Init System: Service failures, dependency issues, configuration errors
- Hardware Related: Disk failures, memory issues, power problems
- Software Related: Updates, configuration changes, dependency breaks
1. BIOS/UEFI Boot Issues
BIOS/UEFI Boot Error Symptoms
| Error Symptom | Display Message | Possible Causes | Severity | Immediate Action |
|---|---|---|---|---|
| No POST Beep/Display | Black screen, no beep codes | Power supply, motherboard, CPU, RAM failure | Critical | Check power, reseat components, test with minimal config |
| Boot Device Not Found | "No bootable device" | Disk failure, wrong boot order, cable issues | Critical | Check BIOS boot order, verify disk connections |
| Invalid Boot Disk | "Invalid system disk" | Corrupted MBR, boot sector issues | Critical | Boot from recovery media, repair MBR |
| Secure Boot Violation | "Secure Boot failed" | Unsigned bootloader/kernel, wrong certificates | High | Disable Secure Boot temporarily or enroll keys |
| CMOS Checksum Error | "CMOS checksum error" | Dead CMOS battery, corrupted settings | Medium | Replace CMOS battery, reset BIOS defaults |
| Overclocking Failed | "Overclocking failed" | Unstable overclock settings | Medium | Reset to default BIOS settings |
| USB Boot Issues | USB device not detected | Legacy/USB boot disabled, device format | Medium | Enable USB boot, try different port/device |
2. GRUB Boot Loader Issues
GRUB Error Diagnosis and Repair
GRUB Boot Process Deep Dive
#!/bin/bash
# grub-recovery-guide.sh - Comprehensive GRUB boot loader recovery
set -euo pipefail
# Configuration
LOG_FILE="/tmp/grub-recovery-$(date +%Y%m%d-%H%M%S).log"
BACKUP_DIR="/boot/grub-backup-$(date +%Y%m%d)"
RECOVERY_MODE="${1:-auto}" # auto, manual, minimal, uefi, bios
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a "$LOG_FILE"
}
error() {
echo "[ERROR] $1" >&2
echo "[$(date '+%Y-%m-%d %H:%M:%S')] ERROR: $1" >> "$LOG_FILE"
exit 1
}
warn() {
echo "[WARNING] $1" >&2
echo "[$(date '+%Y-%m-%d %H:%M:%S')] WARNING: $1" >> "$LOG_FILE"
}
backup_grub_config() {
log "Backing up existing GRUB configuration..."
mkdir -p "$BACKUP_DIR"
# Backup critical GRUB files
cp -a /boot/grub "$BACKUP_DIR/grub" 2>/dev/null || true
cp -a /boot/grub2 "$BACKUP_DIR/grub2" 2>/dev/null || true
cp -a /etc/default/grub "$BACKUP_DIR/grub-default" 2>/dev/null || true
cp -a /etc/grub.d "$BACKUP_DIR/grub.d" 2>/dev/null || true
# Backup MBR and partition table
dd if=/dev/sda of="$BACKUP_DIR/mbr-backup.bin" bs=512 count=1 2>/dev/null || true
fdisk -l > "$BACKUP_DIR/partition-table.txt" 2>/dev/null || true
log "GRUB configuration backed up to: $BACKUP_DIR"
}
detect_boot_mode() {
log "Detecting boot mode..."
# Check for EFI system partition
if [[ -d /sys/firmware/efi ]]; then
log "System is booted in UEFI mode"
BOOT_MODE="uefi"
else
log "System is booted in BIOS/Legacy mode"
BOOT_MODE="bios"
fi
# Detect disk layout
detect_disk_layout
echo "$BOOT_MODE"
}
detect_disk_layout() {
log "Detecting disk layout..."
# Get root partition
ROOT_PART=$(findmnt -n -o SOURCE / 2>/dev/null || echo "")
if [[ -z "$ROOT_PART" ]]; then
ROOT_PART=$(mount | grep " / " | awk '{print $1}' 2>/dev/null || echo "")
fi
# Get disk device
DISK_DEV=$(echo "$ROOT_PART" | sed 's/[0-9]*$//')
# Detect EFI partition
EFI_PART=$(findmnt -n -o SOURCE /boot/efi 2>/dev/null || \
lsblk -o NAME,FSTYPE,MOUNTPOINT | grep -i "vfat.*/boot/efi" | awk '{print $1}' | \
sed 's/^/\/dev\//' 2>/dev/null || echo "")
# Detect /boot partition
BOOT_PART=$(findmnt -n -o SOURCE /boot 2>/dev/null || echo "")
log "Detected:"
log " Root partition: $ROOT_PART"
log " Disk device: $DISK_DEV"
log " EFI partition: $EFI_PART"
log " Boot partition: $BOOT_PART"
export ROOT_PART DISK_DEV EFI_PART BOOT_PART
}
check_grub_installation() {
log "Checking GRUB installation status..."
local issues=0
# Check GRUB binaries
if ! command -v grub-install &> /dev/null; then
warn "grub-install command not found"
issues=$((issues + 1))
fi
if ! command -v update-grub &> /dev/null && ! command -v grub-mkconfig &> /dev/null; then
warn "GRUB configuration tools not found"
issues=$((issues + 1))
fi
# Check GRUB files
if [[ ! -d /boot/grub ]] && [[ ! -d /boot/grub2 ]]; then
warn "GRUB directory not found in /boot"
issues=$((issues + 1))
fi
# Check kernel images
if ! ls /boot/vmlinuz-* &> /dev/null; then
warn "No kernel images found in /boot"
issues=$((issues + 1))
fi
# Check initramfs
if ! ls /boot/initrd-* &> /dev/null && ! ls /boot/initramfs-* &> /dev/null; then
warn "No initramfs images found"
issues=$((issues + 1))
fi
if [[ $issues -eq 0 ]]; then
log "GRUB installation check passed"
else
warn "Found $issues potential issues with GRUB installation"
fi
return $issues
}
repair_bios_grub() {
log "Repairing BIOS/Legacy GRUB installation..."
local disk="$1"
# Validate disk
if [[ ! -b "$disk" ]]; then
error "Invalid disk device: $disk"
fi
log "Installing GRUB to $disk (BIOS mode)"
# Install GRUB to MBR
if grub-install --target=i386-pc --recheck "$disk" 2>&1 | tee -a "$LOG_FILE"; then
log "GRUB installed successfully to MBR of $disk"
else
error "Failed to install GRUB to $disk"
fi
# Install GRUB to partition boot sector if needed
if [[ -n "$BOOT_PART" ]] && [[ "$BOOT_PART" != "$ROOT_PART" ]]; then
log "Installing GRUB to boot partition: $BOOT_PART"
grub-install --target=i386-pc --boot-directory=/boot "$BOOT_PART" 2>&1 | tee -a "$LOG_FILE"
fi
}
repair_uefi_grub() {
log "Repairing UEFI GRUB installation..."
local disk="$1"
local efi_part="$2"
# Validate inputs
if [[ ! -b "$disk" ]]; then
error "Invalid disk device: $disk"
fi
if [[ ! -b "$efi_part" ]]; then
error "Invalid EFI partition: $efi_part"
fi
# Mount EFI partition if not mounted
if ! mountpoint -q /boot/efi; then
log "Mounting EFI partition: $efi_part"
mkdir -p /boot/efi
mount "$efi_part" /boot/efi 2>&1 | tee -a "$LOG_FILE"
EFI_MOUNTED=1
else
EFI_MOUNTED=0
fi
# Install GRUB for UEFI
log "Installing GRUB to EFI system partition"
if grub-install --target=x86_64-efi \
--efi-directory=/boot/efi \
--bootloader-id=GRUB \
--recheck 2>&1 | tee -a "$LOG_FILE"; then
log "GRUB installed successfully to EFI system"
else
# Try with different target
warn "Standard UEFI installation failed, trying alternative..."
grub-install --target=x86_64-efi \
--efi-directory=/boot/efi \
--bootloader-id=GRUB_LINUX \
--recheck 2>&1 | tee -a "$LOG_FILE"
fi
# Create fallback boot entry
log "Creating UEFI fallback boot entry"
cp -r /boot/efi/EFI/GRUB /boot/efi/EFI/BOOT/ 2>/dev/null || true
cp /boot/efi/EFI/GRUB/grubx64.efi /boot/efi/EFI/BOOT/bootx64.efi 2>/dev/null || true
# Unmount if we mounted it
if [[ $EFI_MOUNTED -eq 1 ]]; then
umount /boot/efi 2>/dev/null || true
fi
}
regenerate_grub_config() {
log "Regenerating GRUB configuration..."
# Update GRUB configuration
if command -v update-grub &> /dev/null; then
log "Running update-grub..."
update-grub 2>&1 | tee -a "$LOG_FILE"
elif command -v grub-mkconfig &> /dev/null; then
log "Running grub-mkconfig..."
grub-mkconfig -o /boot/grub/grub.cfg 2>&1 | tee -a "$LOG_FILE"
else
error "No GRUB configuration tool found"
fi
# Verify generated configuration
if [[ -f /boot/grub/grub.cfg ]]; then
local line_count=$(wc -l < /boot/grub/grub.cfg)
log "GRUB configuration generated with $line_count lines"
# Check for kernel entries
local kernel_entries=$(grep -c "menuentry.*Linux" /boot/grub/grub.cfg || echo "0")
log "Found $kernel_entries Linux kernel entries"
if [[ $kernel_entries -eq 0 ]]; then
warn "No Linux kernel entries found in grub.cfg"
fi
else
error "GRUB configuration file not created"
fi
}
fix_initramfs() {
log "Checking and fixing initramfs..."
# Get current kernel version
local current_kernel=$(uname -r)
# Check if initramfs exists for current kernel
local initramfs_found=0
for initramfs in /boot/initrd.img-* /boot/initramfs-*; do
if [[ -f "$initramfs" ]] && echo "$initramfs" | grep -q "$current_kernel"; then
initramfs_found=1
break
fi
done
if [[ $initramfs_found -eq 0 ]]; then
warn "No initramfs found for current kernel ($current_kernel)"
log "Regenerating initramfs..."
# Regenerate initramfs
if command -v update-initramfs &> /dev/null; then
update-initramfs -c -k "$current_kernel" 2>&1 | tee -a "$LOG_FILE"
elif command -v mkinitcpio &> /dev/null; then
mkinitcpio -p linux 2>&1 | tee -a "$LOG_FILE"
elif command -v dracut &> /dev/null; then
dracut --force 2>&1 | tee -a "$LOG_FILE"
else
warn "No initramfs generation tool found"
fi
else
log "initramfs found for current kernel"
fi
# Update initramfs for all kernels
log "Updating initramfs for all installed kernels..."
if command -v update-initramfs &> /dev/null; then
update-initramfs -u -k all 2>&1 | tee -a "$LOG_FILE"
fi
}
check_boot_filesystem() {
log "Checking boot filesystem health..."
# Check filesystem type
local fs_type=$(findmnt -n -o FSTYPE /boot 2>/dev/null || echo "unknown")
log "Boot filesystem type: $fs_type"
# Run filesystem check
case $fs_type in
ext2|ext3|ext4)
log "Running fsck on boot filesystem..."
fsck -n "$BOOT_PART" 2>&1 | tee -a "$LOG_FILE"
;;
vfat|fat32)
log "Running dosfsck on boot filesystem..."
dosfsck -n -v "$BOOT_PART" 2>&1 | tee -a "$LOG_FILE"
;;
*)
warn "Unknown filesystem type for boot: $fs_type"
;;
esac
# Check disk space
local boot_space=$(df -h /boot | awk 'NR==2 {print $5}' | sed 's/%//')
if [[ $boot_space -gt 90 ]]; then
warn "Boot partition is ${boot_space}% full"
log "Cleaning old kernels..."
# Remove old kernels (keep last 3)
if command -v apt &> /dev/null; then
apt autoremove --purge 2>&1 | tee -a "$LOG_FILE"
elif command -v dnf &> /dev/null; then
package-cleanup --oldkernels --count=3 2>&1 | tee -a "$LOG_FILE"
elif command -v yum &> /dev/null; then
package-cleanup --oldkernels --count=3 2>&1 | tee -a "$LOG_FILE"
fi
else
log "Boot partition has adequate free space"
fi
}
create_emergency_boot_entry() {
log "Creating emergency boot entry..."
# Create emergency GRUB entry
cat > /etc/grub.d/09_emergency << 'EOF'
#!/bin/sh
exec tail -n +3 $0
menuentry 'Emergency Boot (Single User Mode)' {
set root='hd0,gpt1'
linux /boot/vmlinuz-$(uname -r) root=/dev/sda1 single
initrd /boot/initrd.img-$(uname -r)
}
menuentry 'Emergency Boot (Recovery Mode)' {
set root='hd0,gpt1'
linux /boot/vmlinuz-$(uname -r) root=/dev/sda1 ro recovery nomodeset
initrd /boot/initrd.img-$(uname -r)
}
menuentry 'Emergency Boot (Previous Kernel)' {
set root='hd0,gpt1'
linux /boot/vmlinuz-$(ls /boot/vmlinuz-* | sort -V | tail -2 | head -1 | sed 's/.*vmlinuz-//') root=/dev/sda1 ro
initrd /boot/initrd.img-$(ls /boot/initrd.img-* | sort -V | tail -2 | head -1 | sed 's/.*initrd.img-//')
}
EOF
chmod +x /etc/grub.d/09_emergency
# Regenerate GRUB config to include emergency entry
regenerate_grub_config
log "Emergency boot entries created"
}
verify_repair() {
log "Verifying GRUB repair..."
local verification_passed=1
# Check GRUB installation
if [[ "$BOOT_MODE" == "uefi" ]]; then
if [[ -f /boot/efi/EFI/GRUB/grubx64.efi ]] || [[ -f /boot/efi/EFI/ubuntu/grubx64.efi ]]; then
log "UEFI GRUB binary verified"
else
warn "UEFI GRUB binary not found"
verification_passed=0
fi
else
# Check for BIOS GRUB in MBR
if dd if="$DISK_DEV" bs=512 count=1 2>/dev/null | grep -q "GRUB"; then
log "BIOS GRUB in MBR verified"
else
warn "GRUB not found in MBR"
verification_passed=0
fi
fi
# Check GRUB configuration
if [[ -f /boot/grub/grub.cfg ]] && [[ -s /boot/grub/grub.cfg ]]; then
log "GRUB configuration file verified"
else
warn "GRUB configuration file missing or empty"
verification_passed=0
fi
# Check kernel entries
local kernel_count=$(grep -c "menuentry.*Linux" /boot/grub/grub.cfg 2>/dev/null || echo "0")
if [[ $kernel_count -gt 0 ]]; then
log "Found $kernel_count kernel entries in GRUB configuration"
else
warn "No kernel entries found in GRUB configuration"
verification_passed=0
fi
if [[ $verification_passed -eq 1 ]]; then
log "GRUB repair verification PASSED"
else
warn "GRUB repair verification FAILED - some issues remain"
fi
return $verification_passed
}
main() {
log "Starting comprehensive GRUB recovery process..."
# Initial setup
backup_grub_config
detect_boot_mode
# Check current state
check_grub_installation
# Determine repair mode
case "$RECOVERY_MODE" in
auto)
if [[ "$BOOT_MODE" == "uefi" ]]; then
repair_uefi_grub "$DISK_DEV" "$EFI_PART"
else
repair_bios_grub "$DISK_DEV"
fi
;;
uefi)
repair_uefi_grub "$DISK_DEV" "$EFI_PART"
;;
bios)
repair_bios_grub "$DISK_DEV"
;;
minimal)
# Only regenerate config
regenerate_grub_config
;;
*)
error "Invalid recovery mode: $RECOVERY_MODE"
;;
esac
# Common repair steps
regenerate_grub_config
fix_initramfs
check_boot_filesystem
create_emergency_boot_entry
# Final verification
if verify_repair; then
log "GRUB recovery completed successfully!"
log ""
log "Summary of actions taken:"
log " 1. Backed up existing configuration to: $BACKUP_DIR"
log " 2. Detected boot mode: $BOOT_MODE"
log " 3. Repaired GRUB installation"
log " 4. Regenerated GRUB configuration"
log " 5. Fixed initramfs images"
log " 6. Created emergency boot entries"
log " 7. Verified repair completion"
log ""
log "Next steps:"
log " 1. Reboot the system to test the repair"
log " 2. If issues persist, check log file: $LOG_FILE"
log " 3. Consider testing with boot repair disk for complex issues"
else
warn "GRUB recovery completed with warnings"
log "Please check the log file for details: $LOG_FILE"
log "Consider additional troubleshooting steps or using a boot repair disk"
fi
}
# Run main function
main "$@"
3. Kernel Panic and Initialization Issues
Kernel Error Diagnosis
# Common Kernel Panic Messages and Diagnostics
# Symptom: "Kernel panic - not syncing: VFS: Unable to mount root fs"
# Possible causes:
# - Missing or corrupted initramfs
# - Wrong root= parameter in kernel command line
# - Missing filesystem drivers in initramfs
# - Corrupted root filesystem
# Diagnostic steps:
# 1. Check kernel command line parameters
cat /proc/cmdline
# 2. Check initramfs content
lsinitramfs /boot/initrd.img-$(uname -r) | grep -E "(ext4|xfs|btrfs)"
# 3. Verify root device
blkid
ls -la /dev/disk/by-uuid/
# 4. Check filesystem integrity
fsck -n /dev/sdX
# Symptom: "Kernel panic - not syncing: Attempted to kill init"
# Possible causes:
# - Corrupted /sbin/init or systemd binary
# - Missing shared libraries
# - SELinux/AppArmor issues
# - Runlevel/target misconfiguration
# Diagnostic steps:
# 1. Check init binary
ls -la /sbin/init
file /sbin/init
# 2. Check library dependencies
ldd /sbin/init
# 3. Check SELinux status
sestatus
getenforce
# 4. Check systemd logs from previous boot
journalctl -b -1 | grep -i "systemd\|init"
# Symptom: "Kernel panic - not syncing: No working init found"
# Possible causes:
# - Missing initramfs
# - Wrong init= parameter
# - Corrupted init binary
# Diagnostic steps:
# 1. Check kernel parameters
cat /proc/cmdline | grep -o "init=[^ ]*"
# 2. Verify init binary exists
ls -la $(cat /proc/cmdline | grep -o "init=[^ ]*" | cut -d= -f2)
# 3. Check if running in container/chroot
cat /proc/1/cgroup
# Symptom: "Kernel panic - not syncing: Fatal exception"
# Possible causes:
# - Hardware failure (RAM, CPU)
# - Kernel bug
# - Driver conflict
# Diagnostic steps:
# 1. Check kernel oops messages
dmesg | grep -i "oops\|panic\|BUG"
# 2. Check hardware logs
dmidecode
lspci
# 3. Test memory
memtest86+
# 4. Check for known bugs
cat /sys/kernel/debug/kmemleak
# Recovering from Kernel Parameter Issues
# Boot into recovery mode or single user mode
# Add these parameters at GRUB prompt:
# - single or 1 (single user mode)
# - init=/bin/bash (drop to shell)
# - rescue (rescue mode)
# - emergency (emergency mode)
# Common recovery kernel parameters:
# rw # Mount root read-write
# ro # Mount root read-only
# init=/bin/bash # Use bash as init
# systemd.unit=rescue.target # Systemd rescue target
# systemd.unit=emergency.target # Systemd emergency target
# nomodeset # Disable kernel mode setting
# noapic # Disable APIC
# nolapic # Disable local APIC
# irqpoll # Force IRQ polling
# acpi=off # Disable ACPI
# pci=noacpi # Disable ACPI for PCI
# debug # Enable kernel debug
# earlyprintk=vga # Early console output
# Editing kernel parameters in GRUB:
# 1. At GRUB menu, press 'e' to edit
# 2. Find the line starting with 'linux' or 'linuxefi'
# 3. Add parameters after the root= parameter
# 4. Press Ctrl+X or F10 to boot
# Permanent kernel parameter changes:
# Edit /etc/default/grub:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
GRUB_CMDLINE_LINUX=""
# Add parameters to GRUB_CMDLINE_LINUX:
GRUB_CMDLINE_LINUX="root=/dev/sda1 nomodeset"
# Update GRUB configuration:
update-grub
# or
grub-mkconfig -o /boot/grub/grub.cfg
# Testing kernel parameters without permanent change:
# 1. Boot with temporary parameters
# 2. If successful, make them permanent
# 3. If not, reboot without them
# Emergency kernel parameter examples:
# For graphics issues:
# Add: nomodeset i915.modeset=0 nouveau.modeset=0
# For filesystem mounting issues:
# Add: rootdelay=10 rootfstype=ext4
# For ACPI issues:
# Add: acpi=off pci=noacpi
# For USB boot issues:
# Add: usbcore.autosuspend=-1
# For RAID/LVM issues:
# Add: rd.auto rd.lvm=1 rd.md=1
# Creating emergency boot entry with safe parameters:
cat > /etc/grub.d/40_custom << 'EOF'
#!/bin/sh
exec tail -n +3 $0
menuentry 'Linux (Safe Mode)' {
set root='hd0,gpt2'
linux /boot/vmlinuz-$(uname -r) root=/dev/sda2 ro nomodeset noapic nolapic
initrd /boot/initrd.img-$(uname -r)
}
menuentry 'Linux (Recovery Mode)' {
set root='hd0,gpt2'
linux /boot/vmlinuz-$(uname -r) root=/dev/sda2 ro single
initrd /boot/initrd.img-$(uname -r)
}
EOF
chmod +x /etc/grub.d/40_custom
update-grub
4. Filesystem and initramfs Issues
Filesystem Corruption and Recovery
| Filesystem Type | Common Corruption Symptoms | Recovery Commands | Risk Level | Preventive Measures |
|---|---|---|---|---|
| ext2/ext3/ext4 | Superblock corruption, journal errors, inode issues | fsck -y /dev/sdX, e2fsck -p -f -v |
Medium | Regular fsck, journaling enabled, backups |
| XFS | Metadata corruption, log mount failures | xfs_repair -n, xfs_repair -L |
High | xfs_check regularly, avoid sudden power loss |
| Btrfs | Checksum errors, extent tree corruption | btrfs check --repair, btrfs scrub |
Medium | Regular scrubs, snapshots, RAID1 for metadata |
| ZFS | Pool corruption, checksum errors | zpool scrub, zpool clear, zfs rollback |
Low | Regular scrubs, redundancy, snapshots |
| FAT32/NTFS | Boot sector corruption, file allocation errors | dosfsck -a -v, ntfsfix |
High | Regular defragmentation, avoid unsafe removal |
| LVM | Volume group metadata corruption | vgcfgrestore, vgck, lvchange -an |
High | Backup /etc/lvm/archive/, regular vgck |
5. Systemd and Init System Failures
Systemd Boot Failure Diagnostics
#!/bin/bash
# systemd-boot-recovery.sh - Systemd initialization failure recovery
set -euo pipefail
LOG_FILE="/tmp/systemd-recovery-$(date +%Y%m%d-%H%M%S).log"
RECOVERY_TARGET="${1:-emergency.target}"
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a "$LOG_FILE"
}
error() {
echo "[ERROR] $1" >&2
echo "[$(date '+%Y-%m-%d %H:%M:%S')] ERROR: $1" >> "$LOG_FILE"
exit 1
}
check_systemd_status() {
log "Checking systemd status..."
# Check if systemd is PID 1
if [[ $(ps -p 1 -o comm=) != "systemd" ]]; then
error "systemd is not PID 1. Current init: $(ps -p 1 -o comm=)"
fi
# Check systemd version
systemctl --version | head -1
# Check system state
local state=$(systemctl is-system-running 2>/dev/null || echo "unknown")
log "System state: $state"
echo "$state"
}
analyze_failed_units() {
log "Analyzing failed systemd units..."
# List failed units
local failed_count=$(systemctl --failed --no-legend | wc -l)
if [[ $failed_count -gt 0 ]]; then
log "Found $failed_count failed units:"
systemctl --failed
# Get details for each failed unit
systemctl --failed --no-legend | while read unit rest; do
log "Analyzing failed unit: $unit"
systemctl status "$unit" --no-pager -l | tail -50 >> "$LOG_FILE"
# Check unit dependencies
log "Dependencies for $unit:"
systemctl list-dependencies "$unit" --no-pager >> "$LOG_FILE"
done
else
log "No failed units found"
fi
return $failed_count
}
check_journal_errors() {
log "Checking systemd journal for errors..."
# Check current boot errors
journalctl -b -p err --no-pager | head -50 >> "$LOG_FILE"
# Check previous boot if available
if journalctl -b -1 --no-pager &>/dev/null; then
log "Errors from previous boot:"
journalctl -b -1 -p err --no-pager | head -20 >> "$LOG_FILE"
fi
# Check for emergency/panic messages
journalctl -b | grep -i "emergency\|panic\|fail\|error" | tail -30 >> "$LOG_FILE"
}
recovery_target_boot() {
local target="$1"
log "Attempting to boot to recovery target: $target"
# Switch to recovery target
if systemctl isolate "$target"; then
log "Successfully switched to $target"
return 0
else
error "Failed to switch to $target"
return 1
fi
}
emergency_shell_recovery() {
log "Starting emergency shell recovery..."
# Check available targets
log "Available systemd targets:"
systemctl list-units --type=target --all --no-pager | grep target
# Common recovery targets:
# - rescue.target: Basic system with network
# - emergency.target: Minimal shell, no services
# - multi-user.target: Normal multi-user without GUI
# - graphical.target: Full desktop environment
# Try to reach emergency target
log "Current default target: $(systemctl get-default)"
# If system won't boot, try emergency kernel parameter:
# Add: systemd.unit=emergency.target to kernel command line
# Manual emergency boot steps:
cat << 'EOF'
MANUAL EMERGENCY RECOVERY STEPS:
1. Reboot and edit GRUB kernel line
2. Add: systemd.unit=emergency.target
3. Or: systemd.unit=rescue.target
4. Boot to emergency shell
5. Run: systemctl default to return to normal
6. Or: systemctl reboot to restart
ALTERNATIVELY:
1. Add: init=/bin/bash to kernel line
2. Mount root: mount -o remount,rw /
3. Fix issues manually
4. Reboot: exec /sbin/init
EOF
}
fix_common_systemd_issues() {
log "Attempting to fix common systemd issues..."
local fixed_issues=0
# 1. Reset failed units
log "Resetting failed units..."
systemctl reset-failed 2>&1 | tee -a "$LOG_FILE"
# 2. Daemon reload
log "Reloading systemd daemon..."
systemctl daemon-reload 2>&1 | tee -a "$LOG_FILE"
# 3. Check for masked units
log "Checking for masked units that shouldn't be..."
systemctl list-unit-files --state=masked | grep -v "static" >> "$LOG_FILE"
# 4. Check systemd configuration
log "Checking systemd configuration..."
systemd-analyze verify --recursive-errors=no /etc/systemd/system/*.service 2>&1 | tee -a "$LOG_FILE"
# 5. Check for broken symlinks
log "Checking for broken systemd symlinks..."
find /etc/systemd/system -type l ! -exec test -e {} \; -print 2>/dev/null >> "$LOG_FILE"
# 6. Check critical services
local critical_services=(
systemd-journald
systemd-udevd
dbus
network
getty
)
for service in "${critical_services[@]}"; do
if systemctl is-enabled "$service" &>/dev/null; then
if ! systemctl is-active "$service" &>/dev/null; then
log "Attempting to start critical service: $service"
systemctl start "$service" 2>&1 | tee -a "$LOG_FILE" && ((fixed_issues++))
fi
fi
done
log "Fixed $fixed_issues issues"
return $fixed_issues
}
check_filesystem_corruption() {
log "Checking for filesystem corruption that affects systemd..."
# Check critical system directories
local critical_dirs=(
/etc/systemd
/lib/systemd
/usr/lib/systemd
/run/systemd
/var/lib/systemd
)
for dir in "${critical_dirs[@]}"; do
if [[ -d "$dir" ]]; then
log "Checking directory: $dir"
# Check for permissions
find "$dir" -type f ! -perm 644 -o -type d ! -perm 755 2>/dev/null | \
head -5 >> "$LOG_FILE"
# Check for broken symlinks
find "$dir" -type l ! -exec test -e {} \; -print 2>/dev/null | \
head -5 >> "$LOG_FILE"
else
warn "Missing directory: $dir"
fi
done
# Check critical binaries
local critical_bins=(
/lib/systemd/systemd
/usr/lib/systemd/systemd
/bin/systemctl
/usr/bin/systemctl
/sbin/init
)
for bin in "${critical_bins[@]}"; do
if [[ -f "$bin" ]]; then
log "Checking binary: $bin"
file "$bin" >> "$LOG_FILE"
ldd "$bin" 2>/dev/null | grep -i "not found" && \
warn "Missing libraries for $bin"
else
warn "Missing binary: $bin"
fi
done
}
recover_journal_corruption() {
log "Checking for journal corruption..."
# Check journal status
journalctl --verify 2>&1 | tee -a "$LOG_FILE"
# If journal is corrupted
if journalctl --verify 2>&1 | grep -q "corrupt"; then
warn "Journal corruption detected"
# Backup existing journal
local backup_dir="/var/log/journal-backup-$(date +%Y%m%d)"
mkdir -p "$backup_dir"
cp -a /var/log/journal/* "$backup_dir/" 2>/dev/null || true
# Clear corrupted journal
log "Clearing corrupted journal..."
journalctl --rotate
journalctl --vacuum-time=1s
# Restart journal daemon
systemctl restart systemd-journald 2>&1 | tee -a "$LOG_FILE"
log "Journal recovery attempted. Backup at: $backup_dir"
else
log "Journal integrity check passed"
fi
}
create_emergency_service() {
log "Creating emergency recovery service..."
# Create emergency service that runs on boot failure
cat > /etc/systemd/system/emergency-recovery.service << 'EOF'
[Unit]
Description=Emergency System Recovery Service
DefaultDependencies=no
Conflicts=shutdown.target
Before=shutdown.target
OnFailure=emergency.target
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/local/bin/emergency-recovery.sh
TimeoutSec=0
StandardOutput=journal+console
StandardError=journal+console
[Install]
WantedBy=multi-user.target
EOF
# Create recovery script
cat > /usr/local/bin/emergency-recovery.sh << 'EOF'
#!/bin/bash
# Emergency recovery script
set -euo pipefail
LOG_FILE="/var/log/emergency-recovery.log"
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" >> "$LOG_FILE"
}
# Basic system checks
log "Starting emergency recovery..."
# Check disk space
df -h >> "$LOG_FILE"
# Check memory
free -h >> "$LOG_FILE"
# Check for failed units
systemctl --failed >> "$LOG_FILE" 2>&1 || true
# Attempt to fix common issues
systemctl daemon-reload >> "$LOG_FILE" 2>&1
systemctl reset-failed >> "$LOG_FILE" 2>&1
# Start critical services
for service in systemd-journald dbus systemd-udevd; do
if ! systemctl is-active "$service" &>/dev/null; then
log "Starting $service"
systemctl start "$service" >> "$LOG_FILE" 2>&1 || true
fi
done
log "Emergency recovery completed"
EOF
chmod +x /usr/local/bin/emergency-recovery.sh
# Enable the service
systemctl enable emergency-recovery.service 2>&1 | tee -a "$LOG_FILE"
log "Emergency recovery service created and enabled"
}
main() {
log "Starting systemd boot failure recovery..."
# Initial checks
local system_state=$(check_systemd_status)
# Analyze current state
analyze_failed_units
check_journal_errors
check_filesystem_corruption
# Attempt recovery
fix_common_systemd_issues
recover_journal_corruption
# If still having issues, try recovery target
if [[ "$system_state" != "running" ]] && [[ "$system_state" != "degraded" ]]; then
log "System not in running state, attempting recovery target..."
recovery_target_boot "$RECOVERY_TARGET" || emergency_shell_recovery
fi
# Create preventive measures
create_emergency_service
log "Systemd recovery process completed"
log ""
log "Summary:"
log " - System state: $system_state"
log " - Recovery target attempted: $RECOVERY_TARGET"
log " - Log file: $LOG_FILE"
log ""
log "Next steps:"
log " 1. Review the log file for detailed analysis"
log " 2. If problems persist, consider:"
log " - Boot with older kernel"
log " - Boot to recovery mode"
log " - Use live CD to chroot and repair"
log " - Restore from backup if available"
log " 3. Test the emergency recovery service"
log " 4. Consider setting up automatic backups"
}
# Run main function
main "$@"
6. Hardware-Related Boot Issues
1. Memory (RAM) Failures:
- System hangs during POST
- Random kernel panics with different error messages
- Data corruption and filesystem errors
- Failed memory tests during boot
- Disk not detected in BIOS/UEFI
- Slow boot times with disk errors
- SMART errors showing in dmesg
- Filesystem corruption after reboot
- System won't power on at all
- Random reboots during boot process
- Inconsistent boot success
- Peripheral devices not working
- No POST, no beep codes
- Overheating during boot
- Inconsistent hardware detection
- BIOS/UEFI resetting to defaults
2. Test with minimal hardware configuration
3. Check all cable connections
4. Monitor temperatures during boot
5. Listen for abnormal sounds (clicking disks, beep codes)
Hardware Diagnostics and Recovery
7. Boot Recovery Tools and Techniques
Boot Recovery Tool Comparison
| Recovery Tool | Primary Use Case | Boot Method | Key Features | Complexity |
|---|---|---|---|---|
| SystemRescueCD | General system recovery | Live CD/USB | Filesystem tools, hardware tests, network recovery | Medium |
| Super Grub2 Disk | GRUB repair and boot issues | Live CD/USB | GRUB recovery, boot sector repair, disk editing | Low-Medium |
| Boot-Repair-Disk | Automatic boot repair | Live CD/USB | One-click GRUB repair, UEFI/BIOS support | Low |
| GParted Live | Partition recovery | Live CD/USB | Partition editing, filesystem resizing, data recovery | Medium |
| TRK (Trinity Rescue Kit) | Password reset, virus scan | Live CD/USB | Password reset, virus cleaning, network tools | Medium |
| Clonezilla Live | Disk imaging and cloning | Live CD/USB | Disk backup, restore, cloning, bare metal recovery | Medium-High |
| Linux Mint Live | General troubleshooting | Live CD/USB | Full desktop, driver support, package management | Low |
| Knoppix | Hardware detection | Live CD/USB | Excellent hardware detection, recovery tools | Low-Medium |
Before Problems Occur: 1. Regular backups: Maintain system image backups
2. Boot media: Create and test recovery USB drives
3. Documentation: Document system configuration
4. Testing: Test recovery procedures regularly
5. Monitoring: Monitor disk health and system logs
During Recovery: 1. Don't panic: Methodical troubleshooting is key
2. Document changes: Keep notes of all changes made
3. Test incrementally: Make one change at a time and test
4. Use safe mode: Boot with minimal configuration first
5. Backup first: Always backup before making changes
After Recovery: 1. Root cause analysis: Identify what caused the problem
2. Preventive measures: Implement fixes to prevent recurrence
3. Update documentation: Update recovery procedures
4. Test backups: Verify backup integrity
5. Monitor closely: Watch for similar issues after recovery
Comprehensive Boot Recovery Script
#!/bin/bash
# boot-recovery-master.sh - Master boot recovery and troubleshooting script
set -euo pipefail
# Configuration
RECOVERY_MODE="${1:-diagnose}"
BACKUP_DIR="/boot/recovery-backup-$(date +%Y%m%d-%H%M%S)"
LOG_FILE="/tmp/boot-recovery-$(date +%Y%m%d-%H%M%S).log"
LIVE_MODE=false
# Colors for output (removed for professional look)
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a "$LOG_FILE"
}
error() {
echo "[ERROR] $1" >&2
echo "[$(date '+%Y-%m-%d %H:%M:%S')] ERROR: $1" >> "$LOG_FILE"
exit 1
}
warn() {
echo "[WARNING] $1" >&2
echo "[$(date '+%Y-%m-%d %H:%M:%S')] WARNING: $1" >> "$LOG_FILE"
}
check_live_mode() {
# Check if running from live media
if [[ -f /etc/live/config.conf ]] || [[ -d /live ]] || mount | grep -q "aufs"; then
LIVE_MODE=true
log "Running in Live USB/CD mode"
else
LIVE_MODE=false
log "Running on installed system"
fi
}
setup_environment() {
log "Setting up recovery environment..."
# Create backup directory
mkdir -p "$BACKUP_DIR"
# Create working directories
mkdir -p /tmp/recovery/{boot,root,efi}
# Check for necessary tools
local missing_tools=""
for tool in lsblk fdisk grep awk sed mount umount chroot; do
if ! command -v "$tool" &> /dev/null; then
missing_tools="$missing_tools $tool"
fi
done
if [[ -n "$missing_tools" ]]; then
warn "Missing tools:$missing_tools"
fi
}
backup_system_state() {
log "Backing up current system state..."
# Backup partition table
fdisk -l > "$BACKUP_DIR/partition-table.txt" 2>/dev/null || true
# Backup boot sector
dd if=/dev/sda of="$BACKUP_DIR/mbr-backup.bin" bs=512 count=1 2>/dev/null || true
# Backup GRUB configuration
cp -a /boot/grub "$BACKUP_DIR/grub" 2>/dev/null || true
cp -a /boot/grub2 "$BACKUP_DIR/grub2" 2>/dev/null || true
cp /etc/default/grub "$BACKUP_DIR/grub-default" 2>/dev/null || true
# Backup kernel and initramfs
mkdir -p "$BACKUP_DIR/kernels"
cp /boot/vmlinuz-* "$BACKUP_DIR/kernels/" 2>/dev/null || true
cp /boot/initrd.img-* "$BACKUP_DIR/kernels/" 2>/dev/null || true
cp /boot/initramfs-* "$BACKUP_DIR/kernels/" 2>/dev/null || true
# Backup fstab
cp /etc/fstab "$BACKUP_DIR/fstab" 2>/dev/null || true
log "System state backed up to: $BACKUP_DIR"
}
analyze_boot_environment() {
log "Analyzing boot environment..."
cat > "$BACKUP_DIR/boot-analysis.txt" << 'EOF'
BOOT ENVIRONMENT ANALYSIS
=========================
EOF
# 1. Detect boot mode
if [[ -d /sys/firmware/efi ]]; then
echo "Boot mode: UEFI" >> "$BACKUP_DIR/boot-analysis.txt"
BOOT_MODE="uefi"
else
echo "Boot mode: BIOS/Legacy" >> "$BACKUP_DIR/boot-analysis.txt"
BOOT_MODE="bios"
fi
# 2. Detect disks and partitions
echo "" >> "$BACKUP_DIR/boot-analysis.txt"
echo "DISK LAYOUT:" >> "$BACKUP_DIR/boot-analysis.txt"
lsblk -o NAME,SIZE,FSTYPE,MOUNTPOINT,LABEL,UUID >> "$BACKUP_DIR/boot-analysis.txt"
# 3. Check mounted filesystems
echo "" >> "$BACKUP_DIR/boot-analysis.txt"
echo "MOUNTED FILESYSTEMS:" >> "$BACKUP_DIR/boot-analysis.txt"
mount | grep "^/dev/" >> "$BACKUP_DIR/boot-analysis.txt"
# 4. Check GRUB installation
echo "" >> "$BACKUP_DIR/boot-analysis.txt"
echo "GRUB INSTALLATION:" >> "$BACKUP_DIR/boot-analysis.txt"
if command -v grub-install &> /dev/null; then
grub-install --version >> "$BACKUP_DIR/boot-analysis.txt"
else
echo "grub-install not found" >> "$BACKUP_DIR/boot-analysis.txt"
fi
# 5. Check kernel versions
echo "" >> "$BACKUP_DIR/boot-analysis.txt"
echo "INSTALLED KERNELS:" >> "$BACKUP_DIR/boot-analysis.txt"
ls -la /boot/vmlinuz-* 2>/dev/null >> "$BACKUP_DIR/boot-analysis.txt" || true
# 6. Check initramfs
echo "" >> "$BACKUP_DIR/boot-analysis.txt"
echo "INITRAMFS IMAGES:" >> "$BACKUP_DIR/boot-analysis.txt"
ls -la /boot/initrd.img-* /boot/initramfs-* 2>/dev/null >> "$BACKUP_DIR/boot-analysis.txt" || true
log "Boot analysis saved to: $BACKUP_DIR/boot-analysis.txt"
}
diagnose_boot_issues() {
log "Diagnosing boot issues..."
local issues_found=0
cat > "$BACKUP_DIR/diagnosis.txt" << 'EOF'
BOOT ISSUE DIAGNOSIS
====================
EOF
# Check 1: Disk space on boot partition
if df -h /boot 2>/dev/null | tail -1 | awk '{print $5}' | grep -q "[0-9][0-9]%"; then
echo "ISSUE: Boot partition nearly full" >> "$BACKUP_DIR/diagnosis.txt"
df -h /boot >> "$BACKUP_DIR/diagnosis.txt"
((issues_found++))
fi
# Check 2: Missing kernel images
if ! ls /boot/vmlinuz-* &> /dev/null; then
echo "ISSUE: No kernel images found in /boot" >> "$BACKUP_DIR/diagnosis.txt"
((issues_found++))
fi
# Check 3: Missing initramfs
if ! ls /boot/initrd.img-* /boot/initramfs-* &> /dev/null; then
echo "ISSUE: No initramfs images found" >> "$BACKUP_DIR/diagnosis.txt"
((issues_found++))
fi
# Check 4: GRUB configuration issues
if [[ -f /boot/grub/grub.cfg ]]; then
if ! grep -q "menuentry.*Linux" /boot/grub/grub.cfg; then
echo "ISSUE: No Linux menuentries in grub.cfg" >> "$BACKUP_DIR/diagnosis.txt"
((issues_found++))
fi
else
echo "ISSUE: grub.cfg not found" >> "$BACKUP_DIR/diagnosis.txt"
((issues_found++))
fi
# Check 5: Filesystem errors
echo "" >> "$BACKUP_DIR/diagnosis.txt"
echo "FILESYSTEM CHECKS:" >> "$BACKUP_DIR/diagnosis.txt"
for mount_point in / /boot; do
if device=$(findmnt -n -o SOURCE "$mount_point" 2>/dev/null); then
echo "Checking $mount_point ($device)..." >> "$BACKUP_DIR/diagnosis.txt"
fsck -n "$device" 2>&1 | grep -i "error\|corrupt" >> "$BACKUP_DIR/diagnosis.txt" || true
fi
done
# Check 6: Hardware issues in logs
echo "" >> "$BACKUP_DIR/diagnosis.txt"
echo "HARDWARE ERRORS IN LOGS:" >> "$BACKUP_DIR/diagnosis.txt"
dmesg | tail -50 | grep -i "error\|fail\|timeout" >> "$BACKUP_DIR/diagnosis.txt" || true
if [[ $issues_found -eq 0 ]]; then
echo "No obvious boot issues detected in automated checks." >> "$BACKUP_DIR/diagnosis.txt"
echo "Check manual analysis for subtle issues." >> "$BACKUP_DIR/diagnosis.txt"
else
echo "" >> "$BACKUP_DIR/diagnosis.txt"
echo "Total issues detected: $issues_found" >> "$BACKUP_DIR/diagnosis.txt"
fi
log "Diagnosis complete. Issues found: $issues_found"
log "Diagnosis report: $BACKUP_DIR/diagnosis.txt"
}
repair_grub() {
log "Starting GRUB repair..."
local repair_log="$BACKUP_DIR/grub-repair.log"
# Detect disks and partitions
local root_part=$(findmnt -n -o SOURCE /)
local boot_part=$(findmnt -n -o SOURCE /boot)
local efi_part=$(findmnt -n -o SOURCE /boot/efi)
local disk=$(echo "$root_part" | sed 's/[0-9]*$//')
log "Detected:"
log " Root: $root_part"
log " Boot: $boot_part"
log " EFI: $efi_part"
log " Disk: $disk"
# Backup current GRUB
log "Backing up current GRUB configuration..."
grub-mkconfig -o "$BACKUP_DIR/grub.cfg.backup" 2>&1 >> "$repair_log" || true
# Repair based on boot mode
if [[ "$BOOT_MODE" == "uefi" ]]; then
repair_grub_uefi "$disk" "$efi_part"
else
repair_grub_bios "$disk"
fi
# Regenerate GRUB configuration
log "Regenerating GRUB configuration..."
update-grub 2>&1 >> "$repair_log" || grub-mkconfig -o /boot/grub/grub.cfg 2>&1 >> "$repair_log"
# Verify repair
log "Verifying GRUB repair..."
if [[ -f /boot/grub/grub.cfg ]] && grep -q "menuentry.*Linux" /boot/grub/grub.cfg; then
log "GRUB repair appears successful"
else
warn "GRUB repair may have issues"
fi
log "GRUB repair log: $repair_log"
}
repair_grub_uefi() {
local disk="$1"
local efi_part="$2"
log "Repairing UEFI GRUB on disk $disk, EFI partition $efi_part"
# Mount EFI partition if needed
local efi_mounted=false
if ! mountpoint -q /boot/efi; then
mkdir -p /boot/efi
mount "$efi_part" /boot/efi 2>&1 >> "$repair_log"
efi_mounted=true
fi
# Install GRUB for UEFI
grub-install --target=x86_64-efi \
--efi-directory=/boot/efi \
--bootloader-id=GRUB \
--recheck 2>&1 >> "$repair_log" || \
grub-install --target=x86_64-efi \
--efi-directory=/boot/efi \
--bootloader-id=GRUB_LINUX \
--recheck 2>&1 >> "$repair_log"
# Create fallback
mkdir -p /boot/efi/EFI/BOOT
cp /boot/efi/EFI/GRUB/grubx64.efi /boot/efi/EFI/BOOT/bootx64.efi 2>/dev/null || true
# Unmount if we mounted it
if [[ "$efi_mounted" == true ]]; then
umount /boot/efi 2>/dev/null || true
fi
}
repair_grub_bios() {
local disk="$1"
log "Repairing BIOS GRUB on disk $disk"
# Install GRUB to MBR
grub-install --target=i386-pc --recheck "$disk" 2>&1 >> "$repair_log"
# Also install to partition if /boot is separate
if [[ -n "$boot_part" ]] && [[ "$boot_part" != "$root_part" ]]; then
grub-install --target=i386-pc --boot-directory=/boot "$boot_part" 2>&1 >> "$repair_log"
fi
}
repair_initramfs() {
log "Repairing initramfs..."
local repair_log="$BACKUP_DIR/initramfs-repair.log"
# Get current kernel
local current_kernel=$(uname -r)
# Check initramfs tools
local initramfs_tool=""
if command -v update-initramfs &> /dev/null; then
initramfs_tool="update-initramfs"
elif command -v mkinitcpio &> /dev/null; then
initramfs_tool="mkinitcpio"
elif command -v dracut &> /dev/null; then
initramfs_tool="dracut"
fi
if [[ -z "$initramfs_tool" ]]; then
warn "No initramfs tool found"
return 1
fi
log "Using initramfs tool: $initramfs_tool"
# Repair initramfs for current kernel
case "$initramfs_tool" in
update-initramfs)
update-initramfs -c -k "$current_kernel" 2>&1 >> "$repair_log"
update-initramfs -u -k all 2>&1 >> "$repair_log"
;;
mkinitcpio)
mkinitcpio -p linux 2>&1 >> "$repair_log"
;;
dracut)
dracut --force 2>&1 >> "$repair_log"
;;
esac
# Verify repair
if ls /boot/initrd.img-"$current_kernel" /boot/initramfs-"$current_kernel".img 2>/dev/null | grep -q .; then
log "initramfs repair successful"
else
warn "initramfs repair may have failed"
fi
log "initramfs repair log: $repair_log"
}
repair_filesystem() {
log "Checking and repairing filesystems..."
local fsck_log="$BACKUP_DIR/fsck-repair.log"
# Check root filesystem
log "Checking root filesystem..."
fsck -y "$root_part" 2>&1 >> "$fsck_log" || true
# Check boot filesystem if separate
if [[ -n "$boot_part" ]] && [[ "$boot_part" != "$root_part" ]]; then
log "Checking boot filesystem..."
fsck -y "$boot_part" 2>&1 >> "$fsck_log" || true
fi
# Check for errors
if grep -q "FILE SYSTEM WAS MODIFIED" "$fsck_log"; then
log "Filesystem repairs were made"
else
log "No filesystem repairs needed"
fi
log "Filesystem repair log: $fsck_log"
}
create_recovery_entries() {
log "Creating recovery boot entries..."
# Create emergency GRUB entries
cat > /etc/grub.d/09_recovery << 'EOF'
#!/bin/sh
exec tail -n +3 $0
menuentry 'Linux (Recovery Mode) -- System Recovery' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-recovery' {
recordfail
load_video
insmod gzio
if [ x$grub_platform = xxen ]; then insmod xzio; insmod lzopio; fi
insmod part_gpt
insmod ext2
set root='hd0,gpt2'
if [ x$feature_platform_search_hint = xy ]; then
search --no-floppy --fs-uuid --set=root --hint-bios=hd0,gpt2 --hint-efi=hd0,gpt2 --hint-baremetal=ahci0,gpt2 $ROOT_UUID
else
search --no-floppy --fs-uuid --set=root $ROOT_UUID
fi
echo 'Loading Linux ...'
linux /boot/vmlinuz-$(uname -r) root=UUID=$ROOT_UUID ro recovery nomodeset
echo 'Loading initial ramdisk ...'
initrd /boot/initrd.img-$(uname -r)
}
menuentry 'Linux (Safe Graphics Mode) -- Troubleshooting' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-safegraphics' {
recordfail
load_video
insmod gzio
if [ x$grub_platform = xxen ]; then insmod xzio; insmod lzopio; fi
insmod part_gpt
insmod ext2
set root='hd0,gpt2'
if [ x$feature_platform_search_hint = xy ]; then
search --no-floppy --fs-uuid --set=root --hint-bios=hd0,gpt2 --hint-efi=hd0,gpt2 --hint-baremetal=ahci0,gpt2 $ROOT_UUID
else
search --no-floppy --fs-uuid --set=root $ROOT_UUID
fi
echo 'Loading Linux ...'
linux /boot/vmlinuz-$(uname -r) root=UUID=$ROOT_UUID ro quiet splash nomodeset
echo 'Loading initial ramdisk ...'
initrd /boot/initrd.img-$(uname -r)
}
menuentry 'Linux (Previous Kernel) -- Fallback' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-previous' {
recordfail
load_video
insmod gzio
if [ x$grub_platform = xxen ]; then insmod xzio; insmod lzopio; fi
insmod part_gpt
insmod ext2
set root='hd0,gpt2'
if [ x$feature_platform_search_hint = xy ]; then
search --no-floppy --fs-uuid --set=root --hint-bios=hd0,gpt2 --hint-efi=hd0,gpt2 --hint-baremetal=ahci0,gpt2 $ROOT_UUID
else
search --no-floppy --fs-uuid --set=root $ROOT_UUID
fi
echo 'Loading Linux ...'
linux /boot/vmlinuz-$(ls /boot/vmlinuz-* | sort -V | tail -2 | head -1 | sed 's/.*vmlinuz-//') root=UUID=$ROOT_UUID ro quiet splash
echo 'Loading initial ramdisk ...'
initrd /boot/initrd.img-$(ls /boot/initrd.img-* | sort -V | tail -2 | head -1 | sed 's/.*initrd.img-//')
}
EOF
chmod +x /etc/grub.d/09_recovery
# Update GRUB to include recovery entries
update-grub 2>&1 | tee -a "$LOG_FILE"
log "Recovery boot entries created"
}
generate_recovery_report() {
log "Generating recovery report..."
cat > "$BACKUP_DIR/recovery-report.txt" << EOF
BOOT RECOVERY REPORT
====================
Recovery performed: $(date)
Recovery mode: $RECOVERY_MODE
Boot mode: $BOOT_MODE
BACKUP INFORMATION:
Backup directory: $BACKUP_DIR
Log file: $LOG_FILE
SYSTEM INFORMATION:
Kernel: $(uname -r)
Distribution: $(lsb_release -d 2>/dev/null | cut -f2 || cat /etc/os-release | grep PRETTY_NAME | cut -d= -f2 | tr -d '"')
RECOVERY ACTIONS PERFORMED:
1. System state backed up
2. Boot environment analyzed
3. Boot issues diagnosed
4. GRUB repaired
5. initramfs repaired
6. Filesystem checked
7. Recovery boot entries created
FILES BACKED UP:
$(find "$BACKUP_DIR" -type f | sed 's|.*/||')
NEXT STEPS:
1. Review the log file: $LOG_FILE
2. Review backup directory: $BACKUP_DIR
3. Reboot to test recovery
4. If issues persist, check diagnosis reports
5. Consider using boot repair live media for complex issues
IMPORTANT FILES:
- Boot analysis: $BACKUP_DIR/boot-analysis.txt
- Diagnosis: $BACKUP_DIR/diagnosis.txt
- GRUB repair log: $BACKUP_DIR/grub-repair.log
- initramfs repair log: $BACKUP_DIR/initramfs-repair.log
RECOVERY TOOLS AVAILABLE:
- SystemRescueCD: https://www.system-rescue.org/
- Super Grub2 Disk: http://www.supergrubdisk.org/
- Boot-Repair-Disk: https://sourceforge.net/projects/boot-repair-cd/
EOF
log "Recovery report generated: $BACKUP_DIR/recovery-report.txt"
}
main() {
log "Starting comprehensive boot recovery process..."
# Initial setup
check_live_mode
setup_environment
backup_system_state
analyze_boot_environment
# Execute based on recovery mode
case "$RECOVERY_MODE" in
diagnose)
diagnose_boot_issues
;;
repair)
diagnose_boot_issues
repair_grub
repair_initramfs
repair_filesystem
create_recovery_entries
;;
minimal)
repair_grub
;;
full)
diagnose_boot_issues
repair_grub
repair_initramfs
repair_filesystem
create_recovery_entries
;;
report-only)
diagnose_boot_issues
;;
*)
error "Unknown recovery mode: $RECOVERY_MODE"
;;
esac
# Generate final report
generate_recovery_report
log "Boot recovery process completed!"
log ""
log "=== RECOVERY SUMMARY ==="
log "Mode: $RECOVERY_MODE"
log "Backup location: $BACKUP_DIR"
log "Log file: $LOG_FILE"
log "Report: $BACKUP_DIR/recovery-report.txt"
log ""
log "Next steps:"
log "1. Review the recovery report"
log "2. Reboot to test the fixes"
log "3. If problems persist, use the diagnosis reports"
log "4. Consider professional help for hardware issues"
}
# Error handling
trap 'log "Recovery interrupted by user"; exit 1' INT
trap 'log "Script terminated unexpectedly"; exit 1' TERM
# Run main function
main "$@"
Boot Issue Resolution Flowchart
Mastering Boot Issue Troubleshooting
System boot problems can be daunting, but with a systematic approach and proper tools, most issues can be diagnosed and resolved. Understanding the boot process, recognizing symptoms, and applying targeted fixes are key skills for any system administrator.
Key Takeaways: The boot process follows distinct stages: BIOS/UEFI, boot loader, kernel initialization, initramfs, and init system. Each stage has characteristic failure modes and recovery techniques. GRUB issues are common but usually repairable with live media. Kernel panics require parameter adjustments or hardware testing. Filesystem corruption needs careful fsck operations. Systemd failures benefit from target-based recovery. Hardware issues require diagnostic testing and potential replacement.
Next Steps: Practice recovery procedures in a safe environment. Create and test boot recovery media. Implement monitoring for early detection of boot-related issues. Establish backup and recovery procedures. Document system configurations and recovery steps. Stay updated on boot loader and kernel developments. Consider automated health checks and preventive maintenance.