Filters & Text Utilities (grep, sed, awk, cut, sort, uniq, tr, wc)

Linux provides a powerful suite of text processing utilities that can transform, filter, and analyze data with incredible efficiency. Mastering tools like grep, sed, awk, and others enables you to handle complex data manipulation tasks directly from the command line, making system administration and data processing significantly more effective.

Essential Text Processing Tools

Tool Primary Purpose Strengths Best For Complexity
grep Pattern searching Fast text search, regex support Finding lines matching patterns Beginner
sed Stream editing Inline editing, substitutions Find/replace, text transformations Intermediate
awk Text processing Field processing, calculations Column-based data, reports Advanced
cut Column extraction Simple field extraction CSV/TSV processing Beginner
sort Sorting lines Multiple sort keys, unique Organizing data Beginner
uniq Duplicate handling Duplicate removal, counting Data deduplication Beginner
tr Character translation Fast character replacement Character-level changes Beginner
wc Word counting Line/word/character counts Text statistics Beginner

grep - Global Regular Expression Print

🔍
grep

Search for patterns in text using regular expressions.

grep [options] pattern [file...]

Common Options:

  • -i - Case insensitive
  • -v - Invert match
  • -r - Recursive search
  • -n - Show line numbers
  • -c - Count matches
  • -l - Show filenames only

Examples:

  • grep "error" logfile.txt
  • grep -r "TODO" src/
  • ps aux | grep "nginx"
📝

grep Examples

Practical grep usage patterns.

# Basic search
grep "pattern" file.txt

Advanced Patterns:

  • grep -E "^[A-Z]" file.txt - Lines starting with uppercase
  • grep -v "^#" config.txt - Exclude comment lines
  • grep -c "success" *.log - Count successes in log files
  • grep -n "error" app.log - Show errors with line numbers
  • grep -A 2 -B 2 "exception" trace.log - Context around matches

grep Variants

Specialized grep commands for different needs.

egrep | fgrep | pgrep

Different Greps:

  • egrep - Extended regex (same as grep -E)
  • fgrep - Fixed strings (same as grep -F)
  • pgrep - Process grep (search process names)
  • rgrep - Recursive grep
  • zgrep - Grep in compressed files

Example: pgrep nginx - Find nginx process IDs

sed - Stream Editor

✂️
sed

Filter and transform text streams using editing commands.

sed [options] 'command' [file...]

Common Commands:

  • s/pattern/replacement/ - Substitute
  • p - Print
  • d - Delete
  • i\text - Insert before
  • a\text - Append after

Examples:

  • sed 's/foo/bar/g' file.txt
  • sed '/pattern/d' file.txt
  • sed -n '10,20p' file.txt
🔄

sed Substitution

Powerful find and replace capabilities.

sed 's/find/replace/flags'

Flags:

  • g - Global replace
  • i - Case insensitive
  • p - Print changed lines
  • w file - Write to file

Examples:

  • sed 's/old/new/g' file.txt - Replace all
  • sed 's/^#//' config.txt - Remove # from start
  • sed 's/.*/\U&/' names.txt - Convert to uppercase
  • sed -i.bak 's/foo/bar/g' file.txt - In-place edit with backup
🎯

sed Addressing

Target specific lines or patterns for editing.

sed '/pattern/command'

Addressing Methods:

  • 10 - Line number 10
  • 10,20 - Lines 10 to 20
  • /pattern/ - Lines matching pattern
  • $ - Last line
  • 1~2 - Every 2nd line starting line 1

Examples:

  • sed '5d' file.txt - Delete line 5
  • sed '/^$/d' file.txt - Delete empty lines
  • sed '1,10s/foo/bar/g' file.txt - Replace in lines 1-10

awk - Pattern Scanning and Processing

📊
awk

Programming language for text processing and data extraction.

awk 'pattern { action }' [file...]

Built-in Variables:

  • NR - Record number
  • NF - Number of fields
  • $0 - Entire line
  • $1, $2... - Field 1, 2, etc.
  • FS - Field separator

Examples:

  • awk '{print $1}' file.txt
  • awk '/error/' logfile.txt
  • awk -F: '{print $1}' /etc/passwd
🧮

awk Programming

Advanced data processing with variables and functions.

awk 'BEGIN{} pattern{} END{}'

Special Patterns:

  • BEGIN - Execute before processing
  • END - Execute after processing
  • pattern - Execute for matching lines

Examples:

  • awk '{sum+=$3} END{print sum}' data.txt - Sum column 3
  • awk 'NR%2==0' file.txt - Print even lines
  • awk 'length($0) > 80' file.txt - Lines longer than 80 chars
  • awk -F, '$3 > 1000 {print $1, $3}' sales.csv - Filter and print
📋

awk One-liners

Useful awk patterns for common tasks.

# Various useful one-liners

Common Tasks:

  • awk '!seen[$0]++' file.txt - Remove duplicates
  • awk '{print NF}' file.txt - Count fields per line
  • awk '{count++} END{print count}' file.txt - Count lines
  • awk 'max - Find max fields
  • awk '{for(i=NF;i>0;i--) printf "%s ",$i; print ""}' - Reverse fields

Simple Text Utilities

✂️
cut

Remove sections from each line of files.

cut [options] [file...]

Options:

  • -f - Fields
  • -d - Delimiter
  • -c - Characters

Examples:

  • cut -d: -f1 /etc/passwd - Get usernames
  • cut -c1-10 file.txt - First 10 characters
  • cut -f2,4 data.csv - Columns 2 and 4
🔠
sort

Sort lines of text files.

sort [options] [file...]

Options:

  • -r - Reverse sort
  • -n - Numeric sort
  • -u - Unique lines
  • -k - Sort by field

Examples:

  • sort file.txt - Alphabetical sort
  • sort -nr file.txt - Reverse numeric
  • sort -k2,2n data.txt - Sort by second field
1️⃣
uniq

Report or omit repeated lines.

uniq [options] [input [output]]

Options:

  • -c - Count occurrences
  • -d - Only duplicates
  • -u - Only uniques

Examples:

  • uniq file.txt - Remove duplicates
  • uniq -c file.txt - Count duplicates
  • sort file.txt | uniq - Proper deduplication
🔄
tr

Translate or delete characters.

tr [options] set1 set2

Options:

  • -d - Delete characters
  • -s - Squeeze repeats
  • -c - Complement set

Examples:

  • tr 'a-z' 'A-Z' < file.txt - To uppercase
  • tr -d '\r' < file.txt - Remove carriage returns
  • tr -s ' ' < file.txt - Squeeze spaces
📏
wc

Print line, word, and byte counts.

wc [options] [file...]

Options:

  • -l - Lines only
  • -w - Words only
  • -c - Bytes only
  • -m - Characters only

Examples:

  • wc file.txt - All counts
  • wc -l *.log - Line counts for logs
  • find . -name "*.py" | wc -l - Count Python files
🔧

Combination Tools

Other useful text processing utilities.

head | tail | paste | join

Additional Tools:

  • head - Output first part of files
  • tail - Output last part of files
  • paste - Merge lines of files
  • join - Join lines on common field
  • column - Columnate lists

Example: paste file1.txt file2.txt

Practical Workflows

Text Processing Pipeline Example

grep
sort
uniq -c
sort -nr
head -10
Result

Real-World Text Processing Examples

# 1. Log Analysis
# Find top 10 IP addresses in web logs
cat access.log | awk '{print $1}' | sort | uniq -c | sort -nr | head -10

# Count HTTP status codes
grep -o 'HTTP/1.[01]" [0-9][0-9][0-9]' access.log | cut -d' ' -f2 | sort | uniq -c

# Extract error messages with timestamps
grep -E "(ERROR|FATAL)" app.log | sed 's/.*\(ERROR\|FATAL\): //'

# 2. System Administration
# Find largest files (human readable)
find /var/log -type f -exec du -h {} + | sort -hr | head -10

# Monitor process memory usage
ps aux --sort=-%mem | awk 'NR<=11 {print $0}'

# Check disk usage by directory
du -sh /* 2>/dev/null | sort -hr

# 3. Data Processing
# Process CSV files
awk -F, '{print $1 "," $3 "," $5}' data.csv > extracted.csv

# Convert TSV to CSV
tr '\t' ',' < data.tsv > data.csv

# Count unique values in column 2
cut -f2 data.txt | sort | uniq -c | sort -nr

# 4. Text Manipulation
# Remove empty lines and trailing whitespace
sed '/^$/d' file.txt | sed 's/[[:space:]]*$//' > clean.txt

# Convert Windows line endings to Unix
tr -d '\r' < windows.txt > unix.txt

# Extract emails from text
grep -E -o "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" file.txt

# 5. File Management
# Find and replace in multiple files
find . -name "*.txt" -exec sed -i 's/old/new/g' {} +

# Count code lines in project
find . -name "*.py" -exec cat {} + | wc -l

# List all unique file extensions
find . -type f | sed 's/.*\.//' | sort -u

Common Use Cases

System Monitoring & Logs

  • Error Tracking: grep -c "ERROR" *.log
  • Performance Monitoring: ps aux --sort=-%cpu | head -10
  • Security Auditing: grep "Failed password" /var/log/auth.log | wc -l
  • Resource Analysis: df -h | grep -v tmpfs | sort -hr -k5

Data Analysis & Reporting

  • CSV Processing: awk -F, '{sum+=$3} END{print sum}' sales.csv
  • Data Cleaning: sed 's/ *$//' data.txt | sed '/^$/d' > clean.txt
  • Statistical Summary: awk '{if(min==""){min=max=$1} if($1>max) max=$1; if($1

File Management

  • Batch Renaming: ls *.jpg | sed 's/\(.*\)\.jpg/mv "&" "\1_backup.jpg"/' | sh
  • Content Search: grep -r "function_name" src/
  • Duplicate Detection: find . -type f -exec md5sum {} + | sort | uniq -w32 -d
Pro Tips:
• Always test complex pipelines step by step
• Use grep -n to see line numbers for debugging
• Combine tools: awk for complex logic, sed for simple substitutions
• Use tr for character-level operations - it's faster than sed
• Remember sort | uniq for proper deduplication
• Use head and tail to test pipelines with sample data
Common Pitfalls:
• Forgetting that uniq requires sorted input
• Not using -i flag for case-insensitive searches when needed
• Overlooking that awk uses space as default field separator
• Using complex regex when simple string search would suffice
• Not testing sed -i commands on backups first
• Forgetting that some tools buffer output affecting real-time processing

Key Takeaways

Linux text processing utilities form a powerful toolkit for data manipulation and analysis. grep excels at pattern matching, sed at stream editing, and awk at field-based processing and calculations. Combined with simpler tools like cut, sort, uniq, tr, and wc, you can build sophisticated data processing pipelines directly from the command line. Mastering these tools enables efficient log analysis, system monitoring, data cleaning, and automated reporting without needing specialized software.

Next Step: Explore shell scripting to combine these text processing tools into reusable automated workflows and system administration scripts.