Filters & Text Utilities (grep, sed, awk, cut, sort, uniq, tr, wc)

Linux provides a powerful suite of text processing utilities that can transform, filter, and analyze data with incredible efficiency. Mastering tools like grep, sed, awk, and others enables you to handle complex data manipulation tasks directly from the command line, making system administration and data processing significantly more effective.

Essential Text Processing Tools

Tool	Primary Purpose	Strengths	Best For	Complexity
grep	Pattern searching	Fast text search, regex support	Finding lines matching patterns	Beginner
sed	Stream editing	Inline editing, substitutions	Find/replace, text transformations	Intermediate
awk	Text processing	Field processing, calculations	Column-based data, reports	Advanced
cut	Column extraction	Simple field extraction	CSV/TSV processing	Beginner
sort	Sorting lines	Multiple sort keys, unique	Organizing data	Beginner
uniq	Duplicate handling	Duplicate removal, counting	Data deduplication	Beginner
tr	Character translation	Fast character replacement	Character-level changes	Beginner
wc	Word counting	Line/word/character counts	Text statistics	Beginner

grep - Global Regular Expression Print

🔍

grep

Search for patterns in text using regular expressions.

grep [options] pattern [file...]

Common Options:

-i - Case insensitive
-v - Invert match
-r - Recursive search
-n - Show line numbers
-c - Count matches
-l - Show filenames only

Examples:

grep "error" logfile.txt
grep -r "TODO" src/
ps aux | grep "nginx"

📝

grep Examples

Practical grep usage patterns.

# Basic search
grep "pattern" file.txt

Advanced Patterns:

grep -E "^[A-Z]" file.txt - Lines starting with uppercase
grep -v "^#" config.txt - Exclude comment lines
grep -c "success" *.log - Count successes in log files
grep -n "error" app.log - Show errors with line numbers
grep -A 2 -B 2 "exception" trace.log - Context around matches

⚡

grep Variants

Specialized grep commands for different needs.

egrep | fgrep | pgrep

Different Greps:

egrep - Extended regex (same as grep -E)
fgrep - Fixed strings (same as grep -F)
pgrep - Process grep (search process names)
rgrep - Recursive grep
zgrep - Grep in compressed files

Example: pgrep nginx - Find nginx process IDs

sed - Stream Editor

✂️

sed

Filter and transform text streams using editing commands.

sed [options] 'command' [file...]

Common Commands:

s/pattern/replacement/ - Substitute
p - Print
d - Delete
i\text - Insert before
a\text - Append after

Examples:

sed 's/foo/bar/g' file.txt
sed '/pattern/d' file.txt
sed -n '10,20p' file.txt

🔄

sed Substitution

Powerful find and replace capabilities.

sed 's/find/replace/flags'

Flags:

g - Global replace
i - Case insensitive
p - Print changed lines
w file - Write to file

Examples:

sed 's/old/new/g' file.txt - Replace all
sed 's/^#//' config.txt - Remove # from start
sed 's/.*/\U&/' names.txt - Convert to uppercase
sed -i.bak 's/foo/bar/g' file.txt - In-place edit with backup

🎯

sed Addressing

Target specific lines or patterns for editing.

sed '/pattern/command'

Addressing Methods:

10 - Line number 10
10,20 - Lines 10 to 20
/pattern/ - Lines matching pattern
$ - Last line
1~2 - Every 2nd line starting line 1

Examples:

sed '5d' file.txt - Delete line 5
sed '/^$/d' file.txt - Delete empty lines
sed '1,10s/foo/bar/g' file.txt - Replace in lines 1-10

awk - Pattern Scanning and Processing

📊

awk

Programming language for text processing and data extraction.

awk 'pattern { action }' [file...]

Built-in Variables:

NR - Record number
NF - Number of fields
$0 - Entire line
$1, $2... - Field 1, 2, etc.
FS - Field separator

Examples:

awk '{print $1}' file.txt
awk '/error/' logfile.txt
awk -F: '{print $1}' /etc/passwd

🧮

awk Programming

Advanced data processing with variables and functions.

awk 'BEGIN{} pattern{} END{}'

Special Patterns:

BEGIN - Execute before processing
END - Execute after processing
pattern - Execute for matching lines

Examples:

awk '{sum+=$3} END{print sum}' data.txt - Sum column 3
awk 'NR%2==0' file.txt - Print even lines
awk 'length($0) > 80' file.txt - Lines longer than 80 chars
awk -F, '$3 > 1000 {print $1, $3}' sales.csv - Filter and print

📋

awk One-liners

Useful awk patterns for common tasks.

# Various useful one-liners

Common Tasks:

awk '!seen[$0]++' file.txt - Remove duplicates
awk '{print NF}' file.txt - Count fields per line
awk '{count++} END{print count}' file.txt - Count lines
awk 'max - Find max fields


                        awk '{for(i=NF;i>0;i--) printf "%s ",$i; print ""}' - Reverse fields



            Simple Text Utilities

            
                
                    ✂️
                    cut
                    Remove sections from each line of files.
                    cut [options] [file...]
                    Options:
                    
                        -f - Fields
                        -d - Delimiter
                        -c - Characters
                    
                    Examples:
                    
                        cut -d: -f1 /etc/passwd - Get usernames
                        cut -c1-10 file.txt - First 10 characters
                        cut -f2,4 data.csv - Columns 2 and 4
                    
                
                
                
                    🔠
                    sort
                    Sort lines of text files.
                    sort [options] [file...]
                    Options:
                    
                        -r - Reverse sort
                        -n - Numeric sort
                        -u - Unique lines
                        -k - Sort by field
                    
                    Examples:
                    
                        sort file.txt - Alphabetical sort
                        sort -nr file.txt - Reverse numeric
                        sort -k2,2n data.txt - Sort by second field
                    
                
                
                
                    1️⃣
                    uniq
                    Report or omit repeated lines.
                    uniq [options] [input [output]]
                    Options:
                    
                        -c - Count occurrences
                        -d - Only duplicates
                        -u - Only uniques
                    
                    Examples:
                    
                        uniq file.txt - Remove duplicates
                        uniq -c file.txt - Count duplicates
                        sort file.txt | uniq - Proper deduplication
                    
                
            

            
                
                    🔄
                    tr
                    Translate or delete characters.
                    tr [options] set1 set2
                    Options:
                    
                        -d - Delete characters
                        -s - Squeeze repeats
                        -c - Complement set
                    
                    Examples:
                    
                        tr 'a-z' 'A-Z' < file.txt - To uppercase
                        tr -d '\r' < file.txt - Remove carriage returns
                        tr -s ' ' < file.txt - Squeeze spaces
                    
                
                
                
                    📏
                    wc
                    Print line, word, and byte counts.
                    wc [options] [file...]
                    Options:
                    
                        -l - Lines only
                        -w - Words only
                        -c - Bytes only
                        -m - Characters only
                    
                    Examples:
                    
                        wc file.txt - All counts
                        wc -l *.log - Line counts for logs
                        find . -name "*.py" | wc -l - Count Python files
                    
                
                
                
                    🔧
                    Combination Tools
                    Other useful text processing utilities.
                    head | tail | paste | join
                    Additional Tools:
                    
                        head - Output first part of files
                        tail - Output last part of files
                        paste - Merge lines of files
                        join - Join lines on common field
                        column - Columnate lists
                    
                    Example: paste file1.txt file2.txt
                
            

            Practical Workflows

            
                Text Processing Pipeline Example
                grep
                →
                sort
                →
                uniq -c
                →
                sort -nr
                →
                head -10
                →
                Result
            

            
                Real-World Text Processing Examples
                # 1. Log Analysis
# Find top 10 IP addresses in web logs
cat access.log | awk '{print $1}' | sort | uniq -c | sort -nr | head -10

# Count HTTP status codes
grep -o 'HTTP/1.[01]" [0-9][0-9][0-9]' access.log | cut -d' ' -f2 | sort | uniq -c

# Extract error messages with timestamps
grep -E "(ERROR|FATAL)" app.log | sed 's/.*\(ERROR\|FATAL\): //'

# 2. System Administration
# Find largest files (human readable)
find /var/log -type f -exec du -h {} + | sort -hr | head -10

# Monitor process memory usage
ps aux --sort=-%mem | awk 'NR<=11 {print $0}'

# Check disk usage by directory
du -sh /* 2>/dev/null | sort -hr

# 3. Data Processing
# Process CSV files
awk -F, '{print $1 "," $3 "," $5}' data.csv > extracted.csv

# Convert TSV to CSV
tr '\t' ',' < data.tsv > data.csv

# Count unique values in column 2
cut -f2 data.txt | sort | uniq -c | sort -nr

# 4. Text Manipulation
# Remove empty lines and trailing whitespace
sed '/^$/d' file.txt | sed 's/[[:space:]]*$//' > clean.txt

# Convert Windows line endings to Unix
tr -d '\r' < windows.txt > unix.txt

# Extract emails from text
grep -E -o "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" file.txt

# 5. File Management
# Find and replace in multiple files
find . -name "*.txt" -exec sed -i 's/old/new/g' {} +

# Count code lines in project
find . -name "*.py" -exec cat {} + | wc -l

# List all unique file extensions
find . -type f | sed 's/.*\.//' | sort -u
            

            Common Use Cases
            
            
                System Monitoring & Logs
                
                    Error Tracking: grep -c "ERROR" *.log
                    Performance Monitoring: ps aux --sort=-%cpu | head -10
                    Security Auditing: grep "Failed password" /var/log/auth.log | wc -l
                    Resource Analysis: df -h | grep -v tmpfs | sort -hr -k5
                

                Data Analysis & Reporting
                
                    CSV Processing: awk -F, '{sum+=$3} END{print sum}' sales.csv
                    Data Cleaning: sed 's/ *$//' data.txt | sed '/^$/d' > clean.txt
                    Statistical Summary: awk '{if(min==""){min=max=$1} if($1>max) max=$1; if($1

                


                File Management
                
                    Batch Renaming: ls *.jpg | sed 's/\(.*\)\.jpg/mv "&" "\1_backup.jpg"/' | sh
                    Content Search: grep -r "function_name" src/
                    Duplicate Detection: find . -type f -exec md5sum {} + | sort | uniq -w32 -d
                
            


            
                Pro Tips:

                • Always test complex pipelines step by step

                • Use grep -n to see line numbers for debugging

                • Combine tools: awk for complex logic, sed for simple substitutions

                • Use tr for character-level operations - it's faster than sed

                • Remember sort | uniq for proper deduplication

                • Use head and tail to test pipelines with sample data
            

            
                Common Pitfalls:

                • Forgetting that uniq requires sorted input

                • Not using -i flag for case-insensitive searches when needed

                • Overlooking that awk uses space as default field separator

                • Using complex regex when simple string search would suffice

                • Not testing sed -i commands on backups first

                • Forgetting that some tools buffer output affecting real-time processing
            

            
                Key Takeaways
                Linux text processing utilities form a powerful toolkit for data manipulation and analysis. grep excels at pattern matching, sed at stream editing, and awk at field-based processing and calculations. Combined with simpler tools like cut, sort, uniq, tr, and wc, you can build sophisticated data processing pipelines directly from the command line. Mastering these tools enables efficient log analysis, system monitoring, data cleaning, and automated reporting without needing specialized software.
                Next Step: Explore shell scripting to combine these text processing tools into reusable automated workflows and system administration scripts.