cheat sheet
sort, uniq & wc
Sort lines (numerically, by field, human-readable sizes), deduplicate with uniq, count lines/words/bytes with wc, and number lines with nl. With real-world pipeline recipes.
sort, uniq & wc — Counting & Ordering
What it is
sort, uniq, and wc are POSIX-standard text utilities present on every Unix and Linux system for ordering, deduplicating, and counting text data. sort reorders lines by string, numeric, or human-readable size values; uniq collapses adjacent duplicate lines (and can count occurrences); wc counts lines, words, characters, or bytes in a file or stream. These three tools are frequently composed in pipelines — sort | uniq -c | sort -rn being the classic frequency-count idiom for log analysis and data exploration.
sort
Common flags
| Flag | Meaning |
|---|---|
-n | Numeric sort |
-r | Reverse order |
-k N | Sort on field N |
-k N,M | Sort on fields N through M |
-t SEP | Field delimiter (default: whitespace) |
-u | Unique — remove duplicate lines |
-f | Case-insensitive (fold) |
-h | Human-readable sizes (2K, 3M, 1G) |
-V | Version sort (1.2 < 1.10) |
-R | Random shuffle |
-s | Stable sort (preserve order of equal lines) |
-c | Check if already sorted; exit 1 if not |
-m | Merge pre-sorted files (no sort step) |
-o FILE | Write output to FILE (can be same as input) |
-z | NUL-terminated lines |
Basic sort
sort file.txt # lexicographic ascending
Output:
apple
banana
cherry
date
elderberry
sort -r file.txt # reverse
Output:
elderberry
date
cherry
banana
apple
sort -n numbers.txt # numeric
Output:
1
4
9
12
27
100
sort -u file.txt # unique lines only
Output:
apple
banana
cherry
sort -h sizes.txt # human sizes: 1K < 2M < 3G
Output:
512
4.0K
128K
1.5M
3.2G
Multi-key sort
# Sort by field 2 numerically, then field 1 lexicographically
sort -t, -k2,2n -k1,1 data.csv
# Sort by field 3 descending, field 1 ascending
sort -k3,3rn -k1,1 data.txt
# Sort CSV by 4th column (numeric) descending
sort -t, -k4,4rn report.csv
Output (sort -t, -k4,4rn report.csv):
Eve,32,Boston,95000
Carol,35,New York,88000
Alice,30,New York,75000
Dave,28,Chicago,71000
Frank,25,Chicago,62000
# Sort by month name (ignore leading whitespace in field)
sort -t: -k1,1 /etc/passwd # by username
Output:
alice:x:1001:1001::/home/alice:/bin/bash
carol:x:1002:1002::/home/carol:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
root:x:0:0:root:/root:/bin/bash
# Sort IP addresses correctly (4-field numeric)
sort -t. -k1,1n -k2,2n -k3,3n -k4,4n ips.txt
Output:
10.0.0.1
10.0.0.14
10.0.1.2
192.168.1.1
192.168.1.100
Sort by partial field
# -k START.CHAR,END.CHAR
sort -k1.3,1.5 file # characters 3–5 of field 1
Output: (none — exits 0 on success)
In-place sort
sort -o file.txt file.txt # overwrite in-place
sort file.txt | sponge file.txt # with moreutils
Output: (none — exits 0 on success)
uniq
uniq collapses consecutive duplicate lines. Input must be sorted first.
Common flags
| Flag | Meaning |
|---|---|
-c | Prefix each line with occurrence count |
-d | Print only duplicate lines (once each) |
-D | Print all copies of duplicate lines |
-u | Print only unique (non-repeated) lines |
-i | Case-insensitive comparison |
-f N | Skip first N fields |
-s N | Skip first N characters |
-w N | Compare only first N characters |
sort file.txt | uniq # deduplicate
Output:
apple
banana
cherry
sort file.txt | uniq -c # count occurrences
Output:
3 apple
1 banana
2 cherry
4 date
sort file.txt | uniq -cd # count + only duplicates
Output:
3 apple
2 cherry
4 date
sort file.txt | uniq -u # lines appearing exactly once
Output:
banana
Frequency table pattern
# Most common words in a file
tr -s '[:space:]' '\n' < file.txt | sort | uniq -c | sort -rn | head -20
Output:
42 the
31 a
28 to
19 and
17 of
12 in
9 for
8 is
7 with
6 that
# Most frequent IPs in access log
awk '{print $1}' access.log | sort | uniq -c | sort -rn | head -10
Output:
412 203.0.113.42
287 10.0.1.5
194 198.51.100.7
88 192.168.1.101
43 203.0.113.99
# Most common HTTP status codes
awk '{print $9}' access.log | sort | uniq -c | sort -rn
Output:
8431 200
712 304
238 404
91 301
47 500
9 403
wc — Word Count
| Flag | Counts |
|---|---|
-l | Lines |
-w | Words |
-c | Bytes |
-m | Characters (multibyte-aware) |
-L | Length of longest line |
wc -l file.txt # line count
Output:
142 file.txt
wc -w file.txt # word count
Output:
1024 file.txt
wc -c file.txt # byte count
Output:
6891 file.txt
wc file.txt # lines + words + bytes
Output:
142 1024 6891 file.txt
wc -l *.log # count per file + total
Output:
312 access.log
1047 app.log
89 error.log
1448 total
# Count matching lines
grep -c "ERROR" app.log
Output:
47
# Count files in a directory
ls | wc -l
Output:
23
# Length of longest line (useful for column-width decisions)
wc -L report.txt
Output:
120 report.txt
nl — Number Lines
nl prefixes each line with a right-justified line number, defaulting to numbering only non-empty lines. It is POSIX-standard and more configurable than the cat -n shorthand — you can control the numbering style, starting value, field width, and separator character.
nl file.txt # number non-empty lines (default)
Output:
1 apple
2 banana
3 cherry
4 date
nl -b a file.txt # number all lines including empty
Output:
1 apple
2 banana
3
4 cherry
5 date
nl -n rz file.txt # right-justified, zero-padded (000001)
Output:
000001 apple
000002 banana
000003 cherry
nl -v 0 file.txt # start numbering at 0
nl -s '. ' file.txt # custom separator after number
nl -n ln file.txt # left-justified
nl -w 3 file.txt # width of line number field
Output: (none — exits 0 on success)
Practical pipelines
# Top 10 largest files in a directory tree
du -sh * 2>/dev/null | sort -rh | head -10
Output:
3.2G backups
1.5M data
512K logs
128K src
48K config
# Count unique visitors in access log (by IP)
awk '{print $1}' access.log | sort -u | wc -l
Output:
847
# Distribution of response sizes
awk '{print $10}' access.log | grep -v '-' | sort -n | uniq -c
Output:
214 512
891 1024
3201 4096
742 16384
89 65536
# Find the 5 most recently modified files
ls -lt | grep '^-' | head -5
Output:
-rw-r--r-- 1 alice staff 4096 Apr 24 14:22 report.csv
-rw-r--r-- 1 alice staff 1280 Apr 24 13:01 config.yaml
-rwxr-xr-x 1 alice staff 512 Apr 24 11:45 deploy.sh
-rw-r--r-- 1 alice staff 65536 Apr 23 22:10 data.parquet
-rw-r--r-- 1 alice staff 2048 Apr 23 19:33 README.md
# Sort a CSV by 3rd column (numeric), keep header
{ head -1 data.csv; tail -n +2 data.csv | sort -t, -k3,3n; }
Output:
name,age,city,salary
Frank,25,Chicago,62000
Dave,28,Chicago,71000
Alice,30,New York,75000
Eve,32,Boston,95000
Carol,35,New York,88000
# Check if a file is already sorted
sort -c file.txt && echo "sorted" || echo "not sorted"
Output:
not sorted
# Merge two pre-sorted files
sort -m sorted1.txt sorted2.txt
Output:
alpha
beta
delta
epsilon
gamma
zeta
# Rank word frequency across multiple files
cat *.txt | tr '[:upper:]' '[:lower:]' | tr -sc '[:alpha:]' '\n' \
| sort | uniq -c | sort -rn | head -30
Output:
312 the
198 and
174 to
143 a
121 of
…
# Show only lines that appear in both files
sort file1.txt > /tmp/s1; sort file2.txt > /tmp/s2
comm -12 /tmp/s1 /tmp/s2
Output:
banana
cherry
elderberry
# Lines only in file1 (not in file2)
comm -23 <(sort file1.txt) <(sort file2.txt)
Output:
apple
date
comm — Compare Sorted Files
comm compares two sorted files line by line, outputting three columns.
comm file1.txt file2.txt # col1: only in f1, col2: only in f2, col3: both
comm -12 f1 f2 # only lines in BOTH (suppress cols 1 and 2)
comm -23 f1 f2 # only in f1 (suppress cols 2 and 3)
comm -13 f1 f2 # only in f2
Output (comm file1.txt file2.txt):
apple
banana
cherry
date
fig
grape
The idiom
sort file | uniq -c | sort -rn(sort → count → sort by count descending) is one of the most useful pipelines for log analysis and data exploration.sort -rn | head -20gives the top 20 most frequent items.