This guide explains not just what commands to run, but also why they work, where to use them, and how to interpret the results for effective Linux system troubleshooting.
1. Understanding Linux Logging System
Before diving into commands, understand how Linux logging works. Logs are categorized by source and stored in specific locations with standardized formats.
Locate System Logs
BeginnerWhat does this do?
Lists all log files in the main log directory with detailed information (permissions, size, date).
Why use this?
To understand what log files are available on your system and identify which ones might contain relevant information for your troubleshooting task.
Where to find common logs?
• System logs: /var/log/syslog, /var/log/messages
• Authentication: /var/log/auth.log, /var/log/secure
• Kernel: /var/log/kern.log
• Boot: /var/log/boot.log
• Application: /var/log/nginx/, /var/log/apache2/
Real Use Case: Disk Full Warning
When you get a "disk full" warning, check log directory size first:
If it's huge (>2GB), you need to investigate which logs are growing rapidly.
Understand Log Rotation
IntermediateWhat is log rotation?
Log rotation is the process of archiving old log files and creating new ones to prevent disk space exhaustion.
How does it work?
When a log file reaches a certain size or age, it's compressed (e.g., .gz) and a new empty log file is created. Old logs are eventually deleted based on retention policies.
Configuration explained:
• weekly: Rotate logs weekly
• rotate 4: Keep 4 weeks of logs
• create: Create new empty log after rotation
• compress: Compress old logs with gzip
• size 100M: Rotate when log reaches 100MB
Important Note
Never delete log files manually while services are running. Always use proper rotation or stop the service first. Deleting active log files can cause service crashes.
2. Master journalctl - The Systemd Journal
journalctl is the modern way to view logs on systemd-based systems. Unlike traditional log files, it stores logs in a binary format with metadata, allowing powerful filtering.
Basic Log Viewing
BeginnerWhat does -f do?
The -f flag stands for "follow". It shows new log entries in real-time as they are written, similar to tail -f for traditional logs.
When to use this?
Use when you're troubleshooting an issue that's happening right now. For example, when a service is failing to start, run this in one terminal while you restart the service in another.
Time-Based Filtering
IntermediateWhat time format?
journalctl understands human-readable time formats: "yesterday", "2 hours ago", "2025-12-16 09:00", "today", "tomorrow".
Practical scenario:
Your application crashed at 2:30 PM. Use this command to see what happened in the hour before the crash (1:30-2:30 PM).
Troubleshooting Example
Problem: Server became slow around 3 PM.
Solution: Check logs from that period:
Service-Specific Logs
IntermediateWhat does -u mean?
-u stands for "unit". It filters logs for a specific systemd service unit. You need to know the exact service name.
Finding service names:
Use systemctl list-unit-files | grep service to find all service names. Common ones: nginx.service, sshd.service, docker.service, mysql.service.
Priority/Error Level Filtering
IntermediateWhat are priority levels?
Logs have severity levels: 0=emerg, 1=alert, 2=crit, 3=err, 4=warning, 5=notice, 6=info, 7=debug. -p err shows errors and more severe messages.
When to check error logs?
Always start troubleshooting by checking errors first. They're usually the root cause of problems. Info logs are for monitoring, error logs are for debugging.
3. grep - Find What You Need
grep (Global Regular Expression Print) is your primary tool for searching text patterns in log files. Mastering its options is crucial for efficient log analysis.
Basic Pattern Search
BeginnerWhat does -i do?
Makes the search case-insensitive. It will match "Error", "ERROR", "error", "ErRor", etc. Very useful because applications use inconsistent capitalization.
Common patterns to search:
• error|fail|critical - General problems
• timeout|slow|lag - Performance issues
• denied|forbidden|unauthorized - Permission problems
• panic|crash|abort - Application crashes
Context Around Matches
IntermediateWhat do -B and -A do?
-B 5 shows 5 lines Before the match. -A 5 shows 5 lines After the match. -C 5 shows 5 lines before AND after (Context).
Why context matters:
An error message alone often doesn't tell the full story. The lines before show what led to the error, the lines after show the consequences.
Multiple File Search
IntermediateWhat does -r do?
Recursively searches through directories. It will search all files in /var/log/ and all its subdirectories.
When problems are scattered:
Network issues often appear in multiple logs: system logs, application logs, firewall logs. Use recursive search to find all occurrences.
Common grep mistakes to avoid:
1. Not escaping special characters: grep "error.log" (dot means "any character") vs grep "error\.log"
2. Searching binary files: Use grep -a or strings | grep
3. Missing -i for case-insensitive: You might miss important matches
4. Real-World Troubleshooting Workflows
Learn systematic approaches to common problems. Each scenario shows a step-by-step debugging process.
Scenario: Website Down (Nginx/Apache)
SystemStep-by-step diagnosis:
Step 1: Check if service is running
Common Error Patterns
• "Address already in use": Port 80 already occupied
• "Permission denied": Nginx can't read files
• "No such file or directory": Missing config/file
• "connect() failed": Backend service down
• "upstream timed out": Backend too slow
Scenario: SSH Connection Failed
SecurityDebugging SSH issues:
Step 1: Check SSH service status
Security Alert Signs
• Multiple "Failed password" from same IP = Brute force attack
• "Invalid user" attempts = Username enumeration
• "Connection closed by" before auth = Possible DoS
• Successful login from unusual location = Possible breach
Scenario: High Server Load
SystemFinding load source:
Step 1: Check current load
Load Analysis Commands
5. Network Issue Diagnosis
Network problems have specific log patterns. Learn where to look and what to search for.
Connection Issues
NetworkWhat these errors mean:
• Connection refused: Service not running on port
• Connection timeout: Firewall blocking or network issue
• No route to host: Network unreachable
• Network is unreachable: Local network configuration issue
DNS Resolution Problems
NetworkDNS troubleshooting steps:
1. Check /etc/resolv.conf
2. Test with: nslookup google.com
3. Check DNS service: systemctl status systemd-resolved
4. Check network manager logs: journalctl -u NetworkManager
Firewall Log Analysis
SecurityWhere are firewall logs?
• iptables: Use logging rule or check kernel messages
• ufw (Ubuntu): /var/log/ufw.log
• firewalld (RHEL): journalctl -u firewalld
• General: Check /var/log/kern.log for netfilter messages
Common Firewall Log Patterns
6. Storage Problem Diagnosis
Disk and filesystem issues have clear signatures in logs. Learn to recognize them early.
Disk Error Detection
SystemWhere to look for disk errors:
1. Kernel messages (most important):
Critical Disk Errors
• "I/O error" - Physical disk problem
• "Buffer I/O error" - Data corruption
• "Read-only filesystem" - Disk in read-only mode due to errors
• "Bad sector" - Physical disk damage
• "Filesystem corruption" - Needs fsck immediately
Disk Space Monitoring
BeginnerProactive monitoring commands:
Check overall usage:
Automated Cleanup Script
7. Application Debugging
Different applications have different log locations and error patterns. Know where to look.
Web Servers (Nginx/Apache)
SystemLog locations:
Nginx:
• Access: /var/log/nginx/access.log
• Error: /var/log/nginx/error.log
• Site-specific: /var/log/nginx/site_error.log
Apache:
• Access: /var/log/apache2/access.log
• Error: /var/log/apache2/error.log
• Other: /var/log/apache2/other_vhosts_access.log
Databases
SystemLog locations:
MySQL/MariaDB:
• Error: /var/log/mysql/error.log
• Slow queries: /var/log/mysql/slow.log
• General: /var/log/mysql/mysql.log
PostgreSQL:
• Main: /var/log/postgresql/postgresql-*.log
• Config: Check data_directory in config
Containers (Docker)
AdvancedDocker logging:
Docker daemon:
Custom Application Logs
IntermediateFinding application logs:
1. Check application configuration:
Look for logfile settings in config files (usually in /etc/appname/)
2. Check systemd unit file:
8. Automate Log Monitoring
Create scripts to automate common log checks and receive alerts for critical issues.
Simple Monitoring Script
IntermediateBasic monitoring script:
Cron Job Setup
Add to crontab to run daily at 8 AM:
Run every hour:
Real-time Alert Script
AdvancedMonitor and alert immediately:
Master Checklist
When troubleshooting ANY problem:
1. Start with errors: journalctl -p err --since "10 minutes ago"
2. Check service status: systemctl status service_name
3. Look at recent logs: tail -100 /var/log/service.log
4. Search for patterns: grep -i "error\|fail\|timeout" logfile
5. Check system resources: top, df -h, free -h
6. Verify connectivity: ping, netstat, ss
7. Document findings: Keep notes of what you checked and found
Pro Tips
• Use tmux/screen: Keep log tails running in separate panes
• Create aliases: alias logs='journalctl -f'
• Bookmark commands: Save useful one-liners in a text file
• Set up log aggregation: Consider ELK stack for multiple servers
• Regular health checks: Run monitoring scripts daily
Critical Reminders
1. Don't ignore warnings: They often become errors
2. Check timestamps: Correlate events across different logs
3. Look for patterns: Recurring errors indicate systemic issues
4. Document solutions: What fixed it today will fix it tomorrow
5. Set up monitoring: Prevent problems before users notice