Complete Log Analysis Guide: What, Why & How Explained

This guide explains not just what commands to run, but also why they work, where to use them, and how to interpret the results for effective Linux system troubleshooting.

1. Understanding Linux Logging System

Before diving into commands, understand how Linux logging works. Logs are categorized by source and stored in specific locations with standardized formats.

Locate System Logs

Beginner
ls -la /var/log/

What does this do?

Lists all log files in the main log directory with detailed information (permissions, size, date).

Why use this?

To understand what log files are available on your system and identify which ones might contain relevant information for your troubleshooting task.

Where to find common logs?

System logs: /var/log/syslog, /var/log/messages
Authentication: /var/log/auth.log, /var/log/secure
Kernel: /var/log/kern.log
Boot: /var/log/boot.log
Application: /var/log/nginx/, /var/log/apache2/

Real Use Case: Disk Full Warning

When you get a "disk full" warning, check log directory size first:

du -sh /var/log/

If it's huge (>2GB), you need to investigate which logs are growing rapidly.

Understand Log Rotation

Intermediate
cat /etc/logrotate.conf

What is log rotation?

Log rotation is the process of archiving old log files and creating new ones to prevent disk space exhaustion.

How does it work?

When a log file reaches a certain size or age, it's compressed (e.g., .gz) and a new empty log file is created. Old logs are eventually deleted based on retention policies.

Configuration explained:

weekly: Rotate logs weekly
rotate 4: Keep 4 weeks of logs
create: Create new empty log after rotation
compress: Compress old logs with gzip
size 100M: Rotate when log reaches 100MB

Important Note

Never delete log files manually while services are running. Always use proper rotation or stop the service first. Deleting active log files can cause service crashes.

2. Master journalctl - The Systemd Journal

journalctl is the modern way to view logs on systemd-based systems. Unlike traditional log files, it stores logs in a binary format with metadata, allowing powerful filtering.

Basic Log Viewing

Beginner
journalctl -f

What does -f do?

The -f flag stands for "follow". It shows new log entries in real-time as they are written, similar to tail -f for traditional logs.

When to use this?

Use when you're troubleshooting an issue that's happening right now. For example, when a service is failing to start, run this in one terminal while you restart the service in another.

Dec 16 10:30:15 server sshd[1234]: Accepted password for user from 192.168.1.100
Dec 16 10:30:20 server systemd[1]: Started Daily apt upgrade and clean activities.
Dec 16 10:31:05 server kernel: USB disconnect, device number 2

Time-Based Filtering

Intermediate
journalctl --since "2 hours ago" --until "1 hour ago"

What time format?

journalctl understands human-readable time formats: "yesterday", "2 hours ago", "2025-12-16 09:00", "today", "tomorrow".

Practical scenario:

Your application crashed at 2:30 PM. Use this command to see what happened in the hour before the crash (1:30-2:30 PM).

Troubleshooting Example

Problem: Server became slow around 3 PM.
Solution: Check logs from that period:

journalctl --since "14:45" --until "15:15" -p warning

Service-Specific Logs

Intermediate
journalctl -u nginx.service --since today

What does -u mean?

-u stands for "unit". It filters logs for a specific systemd service unit. You need to know the exact service name.

Finding service names:

Use systemctl list-unit-files | grep service to find all service names. Common ones: nginx.service, sshd.service, docker.service, mysql.service.

Priority/Error Level Filtering

Intermediate
journalctl -p err -b

What are priority levels?

Logs have severity levels: 0=emerg, 1=alert, 2=crit, 3=err, 4=warning, 5=notice, 6=info, 7=debug. -p err shows errors and more severe messages.

When to check error logs?

Always start troubleshooting by checking errors first. They're usually the root cause of problems. Info logs are for monitoring, error logs are for debugging.

Dec 16 09:15:22 server kernel: [ 123.456] EXT4-fs (sda1): warning: mounting fs with errors
Dec 16 09:20:10 server nginx: [error] 1234#0: *123 connect() failed (111: Connection refused)
Dec 16 09:25:45 server sshd[5678]: error: Could not load host key

3. grep - Find What You Need

grep (Global Regular Expression Print) is your primary tool for searching text patterns in log files. Mastering its options is crucial for efficient log analysis.

Basic Pattern Search

Beginner
grep -i "error" /var/log/syslog

What does -i do?

Makes the search case-insensitive. It will match "Error", "ERROR", "error", "ErRor", etc. Very useful because applications use inconsistent capitalization.

Common patterns to search:

error|fail|critical - General problems
timeout|slow|lag - Performance issues
denied|forbidden|unauthorized - Permission problems
panic|crash|abort - Application crashes

Context Around Matches

Intermediate
grep -B 5 -A 5 "segmentation fault" /var/log/syslog

What do -B and -A do?

-B 5 shows 5 lines Before the match. -A 5 shows 5 lines After the match. -C 5 shows 5 lines before AND after (Context).

Why context matters:

An error message alone often doesn't tell the full story. The lines before show what led to the error, the lines after show the consequences.

[Before context - what happened before error]
Memory allocation request: 1024MB
Available memory: 512MB
Segmentation fault at address 0x12345678
[After context - what happened after]
Process terminated with signal 11
Core dump generated at /var/crash/

Multiple File Search

Intermediate
grep -r "connection refused" /var/log/

What does -r do?

Recursively searches through directories. It will search all files in /var/log/ and all its subdirectories.

When problems are scattered:

Network issues often appear in multiple logs: system logs, application logs, firewall logs. Use recursive search to find all occurrences.

Common grep mistakes to avoid:

1. Not escaping special characters: grep "error.log" (dot means "any character") vs grep "error\.log"
2. Searching binary files: Use grep -a or strings | grep
3. Missing -i for case-insensitive: You might miss important matches

4. Real-World Troubleshooting Workflows

Learn systematic approaches to common problems. Each scenario shows a step-by-step debugging process.

Scenario: Website Down (Nginx/Apache)

System

Step-by-step diagnosis:

Step 1: Check if service is running

systemctl status nginx.service
Step 2: If stopped, check why
journalctl -u nginx.service --since "5 minutes ago" -p err
Step 3: Check configuration errors
nginx -t
Step 4: Check port binding
netstat -tlnp | grep :80
Step 5: Check recent access/error logs
tail -50 /var/log/nginx/error.log

Common Error Patterns

"Address already in use": Port 80 already occupied
"Permission denied": Nginx can't read files
"No such file or directory": Missing config/file
"connect() failed": Backend service down
"upstream timed out": Backend too slow

Scenario: SSH Connection Failed

Security

Debugging SSH issues:

Step 1: Check SSH service status

systemctl status sshd
Step 2: Check if SSH is listening
ss -tlnp | grep :22
Step 3: Check authentication logs
tail -f /var/log/auth.log | grep sshd
Step 4: Check for failed attempts (security)
grep "Failed password" /var/log/auth.log | tail -20
Step 5: Check firewall rules
iptables -L -n | grep :22
ufw status

Security Alert Signs

• Multiple "Failed password" from same IP = Brute force attack
• "Invalid user" attempts = Username enumeration
• "Connection closed by" before auth = Possible DoS
• Successful login from unusual location = Possible breach

Scenario: High Server Load

System

Finding load source:

Step 1: Check current load

uptime
top
Step 2: Check for memory issues
dmesg | grep -i "oom\|out of memory"
Step 3: Check disk I/O
iostat -x 2 5
Step 4: Check for stuck processes
ps aux | grep -E "(D|Z)"
D=Uninterruptible sleep, Z=Zombie process Step 5: Check application logs for errors
journalctl --since "10 minutes ago" -p err | tail -30

Load Analysis Commands

# Show load average: 1min, 5min, 15min
cat /proc/loadavg

# Show running processes sorted by CPU
ps aux --sort=-%cpu | head -20

# Show memory usage sorted
ps aux --sort=-%mem | head -20

5. Network Issue Diagnosis

Network problems have specific log patterns. Learn where to look and what to search for.

Connection Issues

Network
grep -i "connection refused\|timeout\|no route" /var/log/syslog

What these errors mean:

Connection refused: Service not running on port
Connection timeout: Firewall blocking or network issue
No route to host: Network unreachable
Network is unreachable: Local network configuration issue

DNS Resolution Problems

Network
grep -i "name resolution\|host not found" /var/log/syslog

DNS troubleshooting steps:

1. Check /etc/resolv.conf
2. Test with: nslookup google.com
3. Check DNS service: systemctl status systemd-resolved
4. Check network manager logs: journalctl -u NetworkManager

Firewall Log Analysis

Security

Where are firewall logs?

iptables: Use logging rule or check kernel messages
ufw (Ubuntu): /var/log/ufw.log
firewalld (RHEL): journalctl -u firewalld
General: Check /var/log/kern.log for netfilter messages

Common Firewall Log Patterns

# Blocked connections (UFW)
grep "\[UFW BLOCK\]" /var/log/ufw.log

# Dropped packets (iptables)
dmesg | grep -i "DROP"

# Rate limiting hits
grep -i "limit" /var/log/ufw.log

6. Storage Problem Diagnosis

Disk and filesystem issues have clear signatures in logs. Learn to recognize them early.

Disk Error Detection

System

Where to look for disk errors:

1. Kernel messages (most important):

dmesg | grep -i "error\|fail\|I/O\|sector"
2. System logs:
grep -i "disk full\|no space left" /var/log/syslog
3. SMART errors:
sudo smartctl -a /dev/sda | grep -i "error\|fail"
4. Filesystem errors:
journalctl -k | grep -i "ext4\|xfs\|filesystem"

Critical Disk Errors

"I/O error" - Physical disk problem
"Buffer I/O error" - Data corruption
"Read-only filesystem" - Disk in read-only mode due to errors
"Bad sector" - Physical disk damage
"Filesystem corruption" - Needs fsck immediately

Disk Space Monitoring

Beginner

Proactive monitoring commands:

Check overall usage:

df -h
-h = human readable (GB, MB) Find large directories:
du -sh /* | sort -rh | head -20
Find large files:
find / -type f -size +100M 2>/dev/null | xargs ls -lh
Check log directory specifically:
du -sh /var/log/* | sort -rh

Automated Cleanup Script

#!/bin/bash # Clean old log files find /var/log -name "*.log" -type f -mtime +30 -delete find /var/log -name "*.gz" -type f -mtime +90 -delete # Clean temporary files find /tmp -type f -mtime +7 -delete find /var/tmp -type f -mtime +30 -delete # Report disk usage after cleanup df -h /

7. Application Debugging

Different applications have different log locations and error patterns. Know where to look.

Web Servers (Nginx/Apache)

System

Log locations:

Nginx:
• Access: /var/log/nginx/access.log
• Error: /var/log/nginx/error.log
• Site-specific: /var/log/nginx/site_error.log

Apache:
• Access: /var/log/apache2/access.log
• Error: /var/log/apache2/error.log
• Other: /var/log/apache2/other_vhosts_access.log

Databases

System

Log locations:

MySQL/MariaDB:
• Error: /var/log/mysql/error.log
• Slow queries: /var/log/mysql/slow.log
• General: /var/log/mysql/mysql.log

PostgreSQL:
• Main: /var/log/postgresql/postgresql-*.log
• Config: Check data_directory in config

Containers (Docker)

Advanced

Docker logging:

Docker daemon:

journalctl -u docker.service
Container logs:
docker logs container_name
Follow container logs:
docker logs -f container_name
All containers at once:
docker ps -q | xargs -L 1 docker logs

Custom Application Logs

Intermediate

Finding application logs:

1. Check application configuration:
Look for logfile settings in config files (usually in /etc/appname/) 2. Check systemd unit file:

systemctl cat application.service
Look for StandardOutput and StandardError directives 3. Check where process writes:
ls -la /proc/$(pidof appname)/fd/
Look for file descriptors pointing to log files 4. Search for log mentions:
find /etc -type f -exec grep -l "log" {} \;

8. Automate Log Monitoring

Create scripts to automate common log checks and receive alerts for critical issues.

Simple Monitoring Script

Intermediate

Basic monitoring script:

#!/bin/bash # log-monitor.sh - Basic log monitoring LOG_DIR="/var/log" REPORT_FILE="/tmp/log-report-$(date +%Y%m%d).txt" echo "=== Log Analysis Report $(date) ===" > $REPORT_FILE # Check for errors in last hour echo "Errors in last hour:" >> $REPORT_FILE journalctl --since "1 hour ago" -p err | wc -l >> $REPORT_FILE # Check disk space echo "Log directory size:" >> $REPORT_FILE du -sh $LOG_DIR >> $REPORT_FILE # Check for critical patterns echo "Critical errors found:" >> $REPORT_FILE grep -i "panic\|fatal\|segmentation" $LOG_DIR/syslog | tail -5 >> $REPORT_FILE # Email report mail -s "Daily Log Report" admin@example.com < $REPORT_FILE

Cron Job Setup

Add to crontab to run daily at 8 AM:

0 8 * * * /usr/local/bin/log-monitor.sh

Run every hour:

0 * * * * /usr/local/bin/log-monitor.sh

Real-time Alert Script

Advanced

Monitor and alert immediately:

#!/bin/bash # alert-monitor.sh - Real-time log monitoring with alerts ALERT_EMAIL="admin@example.com" CHECK_INTERVAL=60 # seconds while true; do # Check for failed SSH attempts FAILED_SSH=$(grep "Failed password" /var/log/auth.log --since="5 minutes ago" | wc -l) if [ $FAILED_SSH -gt 10 ]; then echo "ALERT: $FAILED_SSH failed SSH attempts in 5 minutes" | \ mail -s "SSH Attack Alert" $ALERT_EMAIL fi # Check for disk space DISK_USAGE=$(df / --output=pcent | tail -1 | tr -d '% ') if [ $DISK_USAGE -gt 90 ]; then echo "ALERT: Disk usage at ${DISK_USAGE}%" | \ mail -s "Disk Space Alert" $ALERT_EMAIL fi sleep $CHECK_INTERVAL done

Master Checklist

When troubleshooting ANY problem:

1. Start with errors: journalctl -p err --since "10 minutes ago"
2. Check service status: systemctl status service_name
3. Look at recent logs: tail -100 /var/log/service.log
4. Search for patterns: grep -i "error\|fail\|timeout" logfile
5. Check system resources: top, df -h, free -h
6. Verify connectivity: ping, netstat, ss
7. Document findings: Keep notes of what you checked and found

Pro Tips

Use tmux/screen: Keep log tails running in separate panes
Create aliases: alias logs='journalctl -f'
Bookmark commands: Save useful one-liners in a text file
Set up log aggregation: Consider ELK stack for multiple servers
Regular health checks: Run monitoring scripts daily

Critical Reminders

1. Don't ignore warnings: They often become errors
2. Check timestamps: Correlate events across different logs
3. Look for patterns: Recurring errors indicate systemic issues
4. Document solutions: What fixed it today will fix it tomorrow
5. Set up monitoring: Prevent problems before users notice