Linux provides a powerful suite of text processing utilities that can transform, filter, and analyze data with incredible efficiency. Mastering tools like grep, sed, awk, and others enables you to handle complex data manipulation tasks directly from the command line, making system administration and data processing significantly more effective.
Essential Text Processing Tools
| Tool | Primary Purpose | Strengths | Best For | Complexity |
|---|---|---|---|---|
| grep | Pattern searching | Fast text search, regex support | Finding lines matching patterns | Beginner |
| sed | Stream editing | Inline editing, substitutions | Find/replace, text transformations | Intermediate |
| awk | Text processing | Field processing, calculations | Column-based data, reports | Advanced |
| cut | Column extraction | Simple field extraction | CSV/TSV processing | Beginner |
| sort | Sorting lines | Multiple sort keys, unique | Organizing data | Beginner |
| uniq | Duplicate handling | Duplicate removal, counting | Data deduplication | Beginner |
| tr | Character translation | Fast character replacement | Character-level changes | Beginner |
| wc | Word counting | Line/word/character counts | Text statistics | Beginner |
grep - Global Regular Expression Print
Search for patterns in text using regular expressions.
Common Options:
-i- Case insensitive-v- Invert match-r- Recursive search-n- Show line numbers-c- Count matches-l- Show filenames only
Examples:
grep "error" logfile.txtgrep -r "TODO" src/ps aux | grep "nginx"
grep Examples
Practical grep usage patterns.
grep "pattern" file.txt
Advanced Patterns:
grep -E "^[A-Z]" file.txt- Lines starting with uppercasegrep -v "^#" config.txt- Exclude comment linesgrep -c "success" *.log- Count successes in log filesgrep -n "error" app.log- Show errors with line numbersgrep -A 2 -B 2 "exception" trace.log- Context around matches
grep Variants
Specialized grep commands for different needs.
Different Greps:
- egrep - Extended regex (same as
grep -E) - fgrep - Fixed strings (same as
grep -F) - pgrep - Process grep (search process names)
- rgrep - Recursive grep
- zgrep - Grep in compressed files
Example: pgrep nginx - Find nginx process IDs
sed - Stream Editor
Filter and transform text streams using editing commands.
Common Commands:
s/pattern/replacement/- Substitutep- Printd- Deletei\text- Insert beforea\text- Append after
Examples:
sed 's/foo/bar/g' file.txtsed '/pattern/d' file.txtsed -n '10,20p' file.txt
sed Substitution
Powerful find and replace capabilities.
Flags:
g- Global replacei- Case insensitivep- Print changed linesw file- Write to file
Examples:
sed 's/old/new/g' file.txt- Replace allsed 's/^#//' config.txt- Remove # from startsed 's/.*/\U&/' names.txt- Convert to uppercasesed -i.bak 's/foo/bar/g' file.txt- In-place edit with backup
sed Addressing
Target specific lines or patterns for editing.
Addressing Methods:
10- Line number 1010,20- Lines 10 to 20/pattern/- Lines matching pattern$- Last line1~2- Every 2nd line starting line 1
Examples:
sed '5d' file.txt- Delete line 5sed '/^$/d' file.txt- Delete empty linessed '1,10s/foo/bar/g' file.txt- Replace in lines 1-10
awk - Pattern Scanning and Processing
Programming language for text processing and data extraction.
Built-in Variables:
NR- Record numberNF- Number of fields$0- Entire line$1, $2...- Field 1, 2, etc.FS- Field separator
Examples:
awk '{print $1}' file.txtawk '/error/' logfile.txtawk -F: '{print $1}' /etc/passwd
awk Programming
Advanced data processing with variables and functions.
Special Patterns:
BEGIN- Execute before processingEND- Execute after processingpattern- Execute for matching lines
Examples:
awk '{sum+=$3} END{print sum}' data.txt- Sum column 3awk 'NR%2==0' file.txt- Print even linesawk 'length($0) > 80' file.txt- Lines longer than 80 charsawk -F, '$3 > 1000 {print $1, $3}' sales.csv- Filter and print
awk One-liners
Useful awk patterns for common tasks.
Common Tasks:
awk '!seen[$0]++' file.txt- Remove duplicatesawk '{print NF}' file.txt- Count fields per lineawk '{count++} END{print count}' file.txt- Count linesawk 'max- Find max fields awk '{for(i=NF;i>0;i--) printf "%s ",$i; print ""}'- Reverse fields
Simple Text Utilities
Remove sections from each line of files.
Options:
-f- Fields-d- Delimiter-c- Characters
Examples:
cut -d: -f1 /etc/passwd- Get usernamescut -c1-10 file.txt- First 10 characterscut -f2,4 data.csv- Columns 2 and 4
Sort lines of text files.
Options:
-r- Reverse sort-n- Numeric sort-u- Unique lines-k- Sort by field
Examples:
sort file.txt- Alphabetical sortsort -nr file.txt- Reverse numericsort -k2,2n data.txt- Sort by second field
Report or omit repeated lines.
Options:
-c- Count occurrences-d- Only duplicates-u- Only uniques
Examples:
uniq file.txt- Remove duplicatesuniq -c file.txt- Count duplicatessort file.txt | uniq- Proper deduplication
Translate or delete characters.
Options:
-d- Delete characters-s- Squeeze repeats-c- Complement set
Examples:
tr 'a-z' 'A-Z' < file.txt- To uppercasetr -d '\r' < file.txt- Remove carriage returnstr -s ' ' < file.txt- Squeeze spaces
Print line, word, and byte counts.
Options:
-l- Lines only-w- Words only-c- Bytes only-m- Characters only
Examples:
wc file.txt- All countswc -l *.log- Line counts for logsfind . -name "*.py" | wc -l- Count Python files
Combination Tools
Other useful text processing utilities.
Additional Tools:
- head - Output first part of files
- tail - Output last part of files
- paste - Merge lines of files
- join - Join lines on common field
- column - Columnate lists
Example: paste file1.txt file2.txt
Practical Workflows
Text Processing Pipeline Example
Real-World Text Processing Examples
# 1. Log Analysis
# Find top 10 IP addresses in web logs
cat access.log | awk '{print $1}' | sort | uniq -c | sort -nr | head -10
# Count HTTP status codes
grep -o 'HTTP/1.[01]" [0-9][0-9][0-9]' access.log | cut -d' ' -f2 | sort | uniq -c
# Extract error messages with timestamps
grep -E "(ERROR|FATAL)" app.log | sed 's/.*\(ERROR\|FATAL\): //'
# 2. System Administration
# Find largest files (human readable)
find /var/log -type f -exec du -h {} + | sort -hr | head -10
# Monitor process memory usage
ps aux --sort=-%mem | awk 'NR<=11 {print $0}'
# Check disk usage by directory
du -sh /* 2>/dev/null | sort -hr
# 3. Data Processing
# Process CSV files
awk -F, '{print $1 "," $3 "," $5}' data.csv > extracted.csv
# Convert TSV to CSV
tr '\t' ',' < data.tsv > data.csv
# Count unique values in column 2
cut -f2 data.txt | sort | uniq -c | sort -nr
# 4. Text Manipulation
# Remove empty lines and trailing whitespace
sed '/^$/d' file.txt | sed 's/[[:space:]]*$//' > clean.txt
# Convert Windows line endings to Unix
tr -d '\r' < windows.txt > unix.txt
# Extract emails from text
grep -E -o "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" file.txt
# 5. File Management
# Find and replace in multiple files
find . -name "*.txt" -exec sed -i 's/old/new/g' {} +
# Count code lines in project
find . -name "*.py" -exec cat {} + | wc -l
# List all unique file extensions
find . -type f | sed 's/.*\.//' | sort -u
Common Use Cases
System Monitoring & Logs
- Error Tracking:
grep -c "ERROR" *.log - Performance Monitoring:
ps aux --sort=-%cpu | head -10 - Security Auditing:
grep "Failed password" /var/log/auth.log | wc -l - Resource Analysis:
df -h | grep -v tmpfs | sort -hr -k5
Data Analysis & Reporting
- CSV Processing:
awk -F, '{sum+=$3} END{print sum}' sales.csv - Data Cleaning:
sed 's/ *$//' data.txt | sed '/^$/d' > clean.txt - Statistical Summary:
awk '{if(min==""){min=max=$1} if($1>max) max=$1; if($1
File Management
- Batch Renaming:
ls *.jpg | sed 's/\(.*\)\.jpg/mv "&" "\1_backup.jpg"/' | sh - Content Search:
grep -r "function_name" src/ - Duplicate Detection:
find . -type f -exec md5sum {} + | sort | uniq -w32 -d
• Always test complex pipelines step by step
• Use
grep -n to see line numbers for debugging• Combine tools:
awk for complex logic, sed for simple substitutions• Use
tr for character-level operations - it's faster than sed• Remember
sort | uniq for proper deduplication• Use
head and tail to test pipelines with sample data
• Forgetting that
uniq requires sorted input• Not using
-i flag for case-insensitive searches when needed• Overlooking that
awk uses space as default field separator• Using complex regex when simple string search would suffice
• Not testing
sed -i commands on backups first• Forgetting that some tools buffer output affecting real-time processing
Key Takeaways
Linux text processing utilities form a powerful toolkit for data manipulation and analysis. grep excels at pattern matching, sed at stream editing, and awk at field-based processing and calculations. Combined with simpler tools like cut, sort, uniq, tr, and wc, you can build sophisticated data processing pipelines directly from the command line. Mastering these tools enables efficient log analysis, system monitoring, data cleaning, and automated reporting without needing specialized software.
Next Step: Explore shell scripting to combine these text processing tools into reusable automated workflows and system administration scripts.