Complete Log Analysis Guide: Linux Troubleshooting Explained

1. Understanding Linux Logging System

Before diving into commands, understand how Linux logging works. Logs are categorized by source and stored in specific locations with standardized formats.

Locate System Logs

Beginner

ls -la /var/log/

What does this do?

Lists all log files in the main log directory with detailed information (permissions, size, date).

Why use this?

To understand what log files are available on your system and identify which ones might contain relevant information for your troubleshooting task.

Where to find common logs?

• System logs: /var/log/syslog, /var/log/messages
• Authentication: /var/log/auth.log, /var/log/secure
• Kernel: /var/log/kern.log
• Boot: /var/log/boot.log
• Application: /var/log/nginx/, /var/log/apache2/

Real Use Case: Disk Full Warning

When you get a "disk full" warning, check log directory size first:

du -sh /var/log/

If it's huge (>2GB), you need to investigate which logs are growing rapidly.

Understand Log Rotation

Intermediate

cat /etc/logrotate.conf

What is log rotation?

Log rotation is the process of archiving old log files and creating new ones to prevent disk space exhaustion.

How does it work?

When a log file reaches a certain size or age, it's compressed (e.g., .gz) and a new empty log file is created. Old logs are eventually deleted based on retention policies.

Configuration explained:

• weekly: Rotate logs weekly
• rotate 4: Keep 4 weeks of logs
• create: Create new empty log after rotation
• compress: Compress old logs with gzip
• size 100M: Rotate when log reaches 100MB

Important Note

Never delete log files manually while services are running. Always use proper rotation or stop the service first. Deleting active log files can cause service crashes.

2. Master journalctl - The Systemd Journal

journalctl is the modern way to view logs on systemd-based systems. Unlike traditional log files, it stores logs in a binary format with metadata, allowing powerful filtering.

Basic Log Viewing

Beginner

journalctl -f

What does -f do?

The -f flag stands for "follow". It shows new log entries in real-time as they are written, similar to tail -f for traditional logs.

When to use this?

Use when you're troubleshooting an issue that's happening right now. For example, when a service is failing to start, run this in one terminal while you restart the service in another.

Dec 16 10:30:15 server sshd[1234]: Accepted password for user from 192.168.1.100

Dec 16 10:30:20 server systemd[1]: Started Daily apt upgrade and clean activities.

Dec 16 10:31:05 server kernel: USB disconnect, device number 2

Time-Based Filtering

Intermediate

journalctl --since "2 hours ago" --until "1 hour ago"

What time format?

journalctl understands human-readable time formats: "yesterday", "2 hours ago", "2025-12-16 09:00", "today", "tomorrow".

Practical scenario:

Your application crashed at 2:30 PM. Use this command to see what happened in the hour before the crash (1:30-2:30 PM).

Troubleshooting Example

Problem: Server became slow around 3 PM.
Solution: Check logs from that period:

journalctl --since "14:45" --until "15:15" -p warning

Service-Specific Logs

Intermediate

journalctl -u nginx.service --since today

What does -u mean?

-u stands for "unit". It filters logs for a specific systemd service unit. You need to know the exact service name.

Finding service names:

Use systemctl list-unit-files | grep service to find all service names. Common ones: nginx.service, sshd.service, docker.service, mysql.service.

Priority/Error Level Filtering

Intermediate

journalctl -p err -b

What are priority levels?

Logs have severity levels: 0=emerg, 1=alert, 2=crit, 3=err, 4=warning, 5=notice, 6=info, 7=debug. -p err shows errors and more severe messages.

When to check error logs?

Always start troubleshooting by checking errors first. They're usually the root cause of problems. Info logs are for monitoring, error logs are for debugging.

Dec 16 09:15:22 server kernel: [ 123.456] EXT4-fs (sda1): warning: mounting fs with errors

Dec 16 09:20:10 server nginx: [error] 1234#0: *123 connect() failed (111: Connection refused)

Dec 16 09:25:45 server sshd[5678]: error: Could not load host key

3. grep - Find What You Need

grep (Global Regular Expression Print) is your primary tool for searching text patterns in log files. Mastering its options is crucial for efficient log analysis.

Basic Pattern Search

Beginner

grep -i "error" /var/log/syslog

What does -i do?

Makes the search case-insensitive. It will match "Error", "ERROR", "error", "ErRor", etc. Very useful because applications use inconsistent capitalization.

Common patterns to search:

Context Around Matches

Intermediate

grep -B 5 -A 5 "segmentation fault" /var/log/syslog

What do -B and -A do?

-B 5 shows 5 lines Before the match. -A 5 shows 5 lines After the match. -C 5 shows 5 lines before AND after (Context).

Why context matters:

An error message alone often doesn't tell the full story. The lines before show what led to the error, the lines after show the consequences.

[Before context - what happened before error]

Memory allocation request: 1024MB

Available memory: 512MB

Segmentation fault at address 0x12345678

[After context - what happened after]

Process terminated with signal 11

Core dump generated at /var/crash/

Multiple File Search

Intermediate

grep -r "connection refused" /var/log/

What does -r do?

Recursively searches through directories. It will search all files in /var/log/ and all its subdirectories.

When problems are scattered:

Network issues often appear in multiple logs: system logs, application logs, firewall logs. Use recursive search to find all occurrences.

Common grep mistakes to avoid:

1. Not escaping special characters: grep "error.log" (dot means "any character") vs grep "error\.log"
2. Searching binary files: Use grep -a or strings | grep
3. Missing -i for case-insensitive: You might miss important matches

4. Real-World Troubleshooting Workflows

Learn systematic approaches to common problems. Each scenario shows a step-by-step debugging process.

Scenario: Website Down (Nginx/Apache)

System

Step-by-step diagnosis:

Step 1: Check if service is running

systemctl status nginx.service

Step 2: If stopped, check why

journalctl -u nginx.service --since "5 minutes ago" -p err

Step 3: Check configuration errors

nginx -t

Step 4: Check port binding

netstat -tlnp | grep :80

Step 5: Check recent access/error logs

tail -50 /var/log/nginx/error.log

Common Error Patterns

• "Address already in use": Port 80 already occupied
• "Permission denied": Nginx can't read files
• "No such file or directory": Missing config/file
• "connect() failed": Backend service down
• "upstream timed out": Backend too slow

Scenario: SSH Connection Failed

Security

Debugging SSH issues:

Step 1: Check SSH service status

systemctl status sshd

Step 2: Check if SSH is listening

ss -tlnp | grep :22

Step 3: Check authentication logs

tail -f /var/log/auth.log | grep sshd

Step 4: Check for failed attempts (security)

grep "Failed password" /var/log/auth.log | tail -20

Step 5: Check firewall rules

iptables -L -n | grep :22

ufw status

Security Alert Signs

• Multiple "Failed password" from same IP = Brute force attack
• "Invalid user" attempts = Username enumeration
• "Connection closed by" before auth = Possible DoS
• Successful login from unusual location = Possible breach

Scenario: High Server Load

System

Finding load source:

Step 1: Check current load

uptime

top

Step 2: Check for memory issues

dmesg | grep -i "oom\|out of memory"

Step 3: Check disk I/O

iostat -x 2 5

Step 4: Check for stuck processes

ps aux | grep -E "(D|Z)"

D=Uninterruptible sleep, Z=Zombie process Step 5: Check application logs for errors

journalctl --since "10 minutes ago" -p err | tail -30

Load Analysis Commands

# Show load average: 1min, 5min, 15min

cat /proc/loadavg

# Show running processes sorted by CPU

ps aux --sort=-%cpu | head -20

# Show memory usage sorted

ps aux --sort=-%mem | head -20

5. Network Issue Diagnosis

Network problems have specific log patterns. Learn where to look and what to search for.

Connection Issues

Network

grep -i "connection refused\|timeout\|no route" /var/log/syslog

What these errors mean:

• Connection refused: Service not running on port
• Connection timeout: Firewall blocking or network issue
• No route to host: Network unreachable
• Network is unreachable: Local network configuration issue

DNS Resolution Problems

Network

grep -i "name resolution\|host not found" /var/log/syslog

DNS troubleshooting steps:

1. Check /etc/resolv.conf
2. Test with: nslookup google.com
3. Check DNS service: systemctl status systemd-resolved
4. Check network manager logs: journalctl -u NetworkManager

Firewall Log Analysis

Security

Where are firewall logs?

• iptables: Use logging rule or check kernel messages
• ufw (Ubuntu): /var/log/ufw.log
• firewalld (RHEL): journalctl -u firewalld
• General: Check /var/log/kern.log for netfilter messages

Common Firewall Log Patterns

# Blocked connections (UFW)

grep "\[UFW BLOCK\]" /var/log/ufw.log

# Dropped packets (iptables)

dmesg | grep -i "DROP"

# Rate limiting hits

grep -i "limit" /var/log/ufw.log

6. Storage Problem Diagnosis

Disk and filesystem issues have clear signatures in logs. Learn to recognize them early.

Disk Error Detection

System

Where to look for disk errors:

1. Kernel messages (most important):

dmesg | grep -i "error\|fail\|I/O\|sector"

2. System logs:

grep -i "disk full\|no space left" /var/log/syslog

3. SMART errors:

sudo smartctl -a /dev/sda | grep -i "error\|fail"

4. Filesystem errors:

journalctl -k | grep -i "ext4\|xfs\|filesystem"

Critical Disk Errors

• "I/O error" - Physical disk problem
• "Buffer I/O error" - Data corruption
• "Read-only filesystem" - Disk in read-only mode due to errors
• "Bad sector" - Physical disk damage
• "Filesystem corruption" - Needs fsck immediately

Disk Space Monitoring

Beginner

Proactive monitoring commands:

Check overall usage:

df -h

-h = human readable (GB, MB) Find large directories:

du -sh /* | sort -rh | head -20

Find large files:

find / -type f -size +100M 2>/dev/null | xargs ls -lh

Check log directory specifically:

du -sh /var/log/* | sort -rh

Automated Cleanup Script

#!/bin/bash # Clean old log files find /var/log -name "*.log" -type f -mtime +30 -delete find /var/log -name "*.gz" -type f -mtime +90 -delete # Clean temporary files find /tmp -type f -mtime +7 -delete find /var/tmp -type f -mtime +30 -delete # Report disk usage after cleanup df -h /

7. Application Debugging

Different applications have different log locations and error patterns. Know where to look.

Web Servers (Nginx/Apache)

System

Log locations:

Nginx:
• Access: /var/log/nginx/access.log
• Error: /var/log/nginx/error.log
• Site-specific: /var/log/nginx/site_error.log

Apache:
• Access: /var/log/apache2/access.log
• Error: /var/log/apache2/error.log
• Other: /var/log/apache2/other_vhosts_access.log

Databases

System

Log locations:

MySQL/MariaDB:
• Error: /var/log/mysql/error.log
• Slow queries: /var/log/mysql/slow.log
• General: /var/log/mysql/mysql.log

PostgreSQL:
• Main: /var/log/postgresql/postgresql-*.log
• Config: Check data_directory in config

Containers (Docker)

Advanced

Docker logging:

Docker daemon:

journalctl -u docker.service

Container logs:

docker logs container_name

Follow container logs:

docker logs -f container_name

All containers at once:

docker ps -q | xargs -L 1 docker logs

Custom Application Logs

Intermediate

Finding application logs:

1. Check application configuration:
Look for logfile settings in config files (usually in /etc/appname/) 2. Check systemd unit file:

systemctl cat application.service

Look for StandardOutput and StandardError directives 3. Check where process writes:

ls -la /proc/$(pidof appname)/fd/

Look for file descriptors pointing to log files 4. Search for log mentions:

find /etc -type f -exec grep -l "log" {} \;

8. Automate Log Monitoring

Create scripts to automate common log checks and receive alerts for critical issues.

Simple Monitoring Script

Intermediate

Basic monitoring script:

#!/bin/bash # log-monitor.sh - Basic log monitoring LOG_DIR="/var/log" REPORT_FILE="/tmp/log-report-$(date +%Y%m%d).txt" echo "=== Log Analysis Report $(date) ===" > $REPORT_FILE # Check for errors in last hour echo "Errors in last hour:" >> $REPORT_FILE journalctl --since "1 hour ago" -p err | wc -l >> $REPORT_FILE # Check disk space echo "Log directory size:" >> $REPORT_FILE du -sh $LOG_DIR >> $REPORT_FILE # Check for critical patterns echo "Critical errors found:" >> $REPORT_FILE grep -i "panic\|fatal\|segmentation" $LOG_DIR/syslog | tail -5 >> $REPORT_FILE # Email report mail -s "Daily Log Report" admin@example.com < $REPORT_FILE

Cron Job Setup

Add to crontab to run daily at 8 AM:

0 8 * * * /usr/local/bin/log-monitor.sh

Run every hour:

0 * * * * /usr/local/bin/log-monitor.sh

Real-time Alert Script

Advanced

Monitor and alert immediately:

#!/bin/bash # alert-monitor.sh - Real-time log monitoring with alerts ALERT_EMAIL="admin@example.com" CHECK_INTERVAL=60 # seconds while true; do # Check for failed SSH attempts FAILED_SSH=$(grep "Failed password" /var/log/auth.log --since="5 minutes ago" | wc -l) if [ $FAILED_SSH -gt 10 ]; then echo "ALERT: $FAILED_SSH failed SSH attempts in 5 minutes" | \ mail -s "SSH Attack Alert" $ALERT_EMAIL fi # Check for disk space DISK_USAGE=$(df / --output=pcent | tail -1 | tr -d '% ') if [ $DISK_USAGE -gt 90 ]; then echo "ALERT: Disk usage at ${DISK_USAGE}%" | \ mail -s "Disk Space Alert" $ALERT_EMAIL fi sleep $CHECK_INTERVAL done

Master Checklist

When troubleshooting ANY problem:

1. Start with errors: journalctl -p err --since "10 minutes ago"
2. Check service status: systemctl status service_name
3. Look at recent logs: tail -100 /var/log/service.log
4. Search for patterns: grep -i "error\|fail\|timeout" logfile
5. Check system resources: top, df -h, free -h
6. Verify connectivity: ping, netstat, ss
7. Document findings: Keep notes of what you checked and found

Pro Tips

• Use tmux/screen: Keep log tails running in separate panes
• Create aliases: alias logs='journalctl -f'
• Bookmark commands: Save useful one-liners in a text file
• Set up log aggregation: Consider ELK stack for multiple servers
• Regular health checks: Run monitoring scripts daily

Critical Reminders

1. Don't ignore warnings: They often become errors
2. Check timestamps: Correlate events across different logs
3. Look for patterns: Recurring errors indicate systemic issues
4. Document solutions: What fixed it today will fix it tomorrow
5. Set up monitoring: Prevent problems before users notice

Complete Log Analysis Guide: What, Why & How Explained

1. Understanding Linux Logging System

Locate System Logs

What does this do?

Why use this?

Where to find common logs?

Real Use Case: Disk Full Warning

Understand Log Rotation

What is log rotation?

How does it work?

Configuration explained:

Important Note

2. Master journalctl - The Systemd Journal

Basic Log Viewing

What does -f do?

When to use this?

Time-Based Filtering

What time format?

Practical scenario:

Troubleshooting Example

Service-Specific Logs

What does -u mean?

Finding service names:

Priority/Error Level Filtering

What are priority levels?

When to check error logs?

3. grep - Find What You Need

Basic Pattern Search

What does -i do?

Common patterns to search:

Context Around Matches

What do -B and -A do?

Why context matters:

Multiple File Search

What does -r do?

When problems are scattered:

Common grep mistakes to avoid:

4. Real-World Troubleshooting Workflows

Scenario: Website Down (Nginx/Apache)

Step-by-step diagnosis:

Common Error Patterns

Scenario: SSH Connection Failed

Debugging SSH issues:

Security Alert Signs

Scenario: High Server Load

Finding load source:

Load Analysis Commands

5. Network Issue Diagnosis

Connection Issues

What these errors mean:

DNS Resolution Problems

DNS troubleshooting steps:

Firewall Log Analysis

Where are firewall logs?

Common Firewall Log Patterns

6. Storage Problem Diagnosis

Disk Error Detection

Where to look for disk errors:

Critical Disk Errors

Disk Space Monitoring

Proactive monitoring commands:

Automated Cleanup Script

7. Application Debugging

Web Servers (Nginx/Apache)

Log locations:

Databases

Log locations:

Containers (Docker)

Docker logging:

Custom Application Logs

Finding application logs:

8. Automate Log Monitoring

Simple Monitoring Script

Basic monitoring script:

Cron Job Setup

Real-time Alert Script

Monitor and alert immediately:

Master Checklist

When troubleshooting ANY problem:

Pro Tips