cheat sheet

sort, uniq & wc

Sort lines (numerically, by field, human-readable sizes), deduplicate with uniq, count lines/words/bytes with wc, and number lines with nl. With real-world pipeline recipes.

sort, uniq & wc — Counting & Ordering

What it is

sort, uniq, and wc are POSIX-standard text utilities present on every Unix and Linux system for ordering, deduplicating, and counting text data. sort reorders lines by string, numeric, or human-readable size values; uniq collapses adjacent duplicate lines (and can count occurrences); wc counts lines, words, characters, or bytes in a file or stream. These three tools are frequently composed in pipelines — sort | uniq -c | sort -rn being the classic frequency-count idiom for log analysis and data exploration.

sort

Common flags

FlagMeaning
-nNumeric sort
-rReverse order
-k NSort on field N
-k N,MSort on fields N through M
-t SEPField delimiter (default: whitespace)
-uUnique — remove duplicate lines
-fCase-insensitive (fold)
-hHuman-readable sizes (2K, 3M, 1G)
-VVersion sort (1.2 < 1.10)
-RRandom shuffle
-sStable sort (preserve order of equal lines)
-cCheck if already sorted; exit 1 if not
-mMerge pre-sorted files (no sort step)
-o FILEWrite output to FILE (can be same as input)
-zNUL-terminated lines

Basic sort

bash
sort file.txt              # lexicographic ascending

Output:

text
apple
banana
cherry
date
elderberry
bash
sort -r file.txt           # reverse

Output:

text
elderberry
date
cherry
banana
apple
bash
sort -n numbers.txt        # numeric

Output:

text
1
4
9
12
27
100
bash
sort -u file.txt           # unique lines only

Output:

text
apple
banana
cherry
bash
sort -h sizes.txt          # human sizes: 1K < 2M < 3G

Output:

text
512
4.0K
128K
1.5M
3.2G

Multi-key sort

bash
# Sort by field 2 numerically, then field 1 lexicographically
sort -t, -k2,2n -k1,1 data.csv

# Sort by field 3 descending, field 1 ascending
sort -k3,3rn -k1,1 data.txt

# Sort CSV by 4th column (numeric) descending
sort -t, -k4,4rn report.csv

Output (sort -t, -k4,4rn report.csv):

text
Eve,32,Boston,95000
Carol,35,New York,88000
Alice,30,New York,75000
Dave,28,Chicago,71000
Frank,25,Chicago,62000
bash
# Sort by month name (ignore leading whitespace in field)
sort -t: -k1,1 /etc/passwd      # by username

Output:

text
alice:x:1001:1001::/home/alice:/bin/bash
carol:x:1002:1002::/home/carol:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
root:x:0:0:root:/root:/bin/bash
bash
# Sort IP addresses correctly (4-field numeric)
sort -t. -k1,1n -k2,2n -k3,3n -k4,4n ips.txt

Output:

text
10.0.0.1
10.0.0.14
10.0.1.2
192.168.1.1
192.168.1.100

Sort by partial field

bash
# -k START.CHAR,END.CHAR
sort -k1.3,1.5 file     # characters 3–5 of field 1

Output: (none — exits 0 on success)

In-place sort

bash
sort -o file.txt file.txt    # overwrite in-place
sort file.txt | sponge file.txt  # with moreutils

Output: (none — exits 0 on success)


uniq

uniq collapses consecutive duplicate lines. Input must be sorted first.

Common flags

FlagMeaning
-cPrefix each line with occurrence count
-dPrint only duplicate lines (once each)
-DPrint all copies of duplicate lines
-uPrint only unique (non-repeated) lines
-iCase-insensitive comparison
-f NSkip first N fields
-s NSkip first N characters
-w NCompare only first N characters
bash
sort file.txt | uniq           # deduplicate

Output:

text
apple
banana
cherry
bash
sort file.txt | uniq -c        # count occurrences

Output:

text
      3 apple
      1 banana
      2 cherry
      4 date
bash
sort file.txt | uniq -cd       # count + only duplicates

Output:

text
      3 apple
      2 cherry
      4 date
bash
sort file.txt | uniq -u        # lines appearing exactly once

Output:

text
banana

Frequency table pattern

bash
# Most common words in a file
tr -s '[:space:]' '\n' < file.txt | sort | uniq -c | sort -rn | head -20

Output:

text
     42 the
     31 a
     28 to
     19 and
     17 of
     12 in
      9 for
      8 is
      7 with
      6 that
bash
# Most frequent IPs in access log
awk '{print $1}' access.log | sort | uniq -c | sort -rn | head -10

Output:

text
    412 203.0.113.42
    287 10.0.1.5
    194 198.51.100.7
     88 192.168.1.101
     43 203.0.113.99
bash
# Most common HTTP status codes
awk '{print $9}' access.log | sort | uniq -c | sort -rn

Output:

text
   8431 200
    712 304
    238 404
     91 301
     47 500
      9 403

wc — Word Count

FlagCounts
-lLines
-wWords
-cBytes
-mCharacters (multibyte-aware)
-LLength of longest line
bash
wc -l file.txt            # line count

Output:

text
      142 file.txt
bash
wc -w file.txt            # word count

Output:

text
     1024 file.txt
bash
wc -c file.txt            # byte count

Output:

text
     6891 file.txt
bash
wc file.txt               # lines + words + bytes

Output:

text
  142  1024  6891 file.txt
bash
wc -l *.log               # count per file + total

Output:

text
    312 access.log
   1047 app.log
     89 error.log
   1448 total
bash
# Count matching lines
grep -c "ERROR" app.log

Output:

text
47
bash
# Count files in a directory
ls | wc -l

Output:

text
      23
bash
# Length of longest line (useful for column-width decisions)
wc -L report.txt

Output:

text
     120 report.txt

nl — Number Lines

nl prefixes each line with a right-justified line number, defaulting to numbering only non-empty lines. It is POSIX-standard and more configurable than the cat -n shorthand — you can control the numbering style, starting value, field width, and separator character.

bash
nl file.txt               # number non-empty lines (default)

Output:

text
     1	apple
     2	banana

     3	cherry
     4	date
bash
nl -b a file.txt          # number all lines including empty

Output:

text
     1	apple
     2	banana
     3	
     4	cherry
     5	date
bash
nl -n rz file.txt         # right-justified, zero-padded (000001)

Output:

text
000001	apple
000002	banana
000003	cherry
bash
nl -v 0 file.txt          # start numbering at 0
nl -s '. ' file.txt       # custom separator after number
nl -n ln file.txt         # left-justified
nl -w 3 file.txt          # width of line number field

Output: (none — exits 0 on success)


Practical pipelines

bash
# Top 10 largest files in a directory tree
du -sh * 2>/dev/null | sort -rh | head -10

Output:

text
3.2G	backups
1.5M	data
512K	logs
128K	src
 48K	config
bash
# Count unique visitors in access log (by IP)
awk '{print $1}' access.log | sort -u | wc -l

Output:

text
      847
bash
# Distribution of response sizes
awk '{print $10}' access.log | grep -v '-' | sort -n | uniq -c

Output:

text
    214 512
    891 1024
   3201 4096
    742 16384
     89 65536
bash
# Find the 5 most recently modified files
ls -lt | grep '^-' | head -5

Output:

text
-rw-r--r-- 1 alice staff  4096 Apr 24 14:22 report.csv
-rw-r--r-- 1 alice staff  1280 Apr 24 13:01 config.yaml
-rwxr-xr-x 1 alice staff   512 Apr 24 11:45 deploy.sh
-rw-r--r-- 1 alice staff 65536 Apr 23 22:10 data.parquet
-rw-r--r-- 1 alice staff  2048 Apr 23 19:33 README.md
bash
# Sort a CSV by 3rd column (numeric), keep header
{ head -1 data.csv; tail -n +2 data.csv | sort -t, -k3,3n; }

Output:

text
name,age,city,salary
Frank,25,Chicago,62000
Dave,28,Chicago,71000
Alice,30,New York,75000
Eve,32,Boston,95000
Carol,35,New York,88000
bash
# Check if a file is already sorted
sort -c file.txt && echo "sorted" || echo "not sorted"

Output:

text
not sorted
bash
# Merge two pre-sorted files
sort -m sorted1.txt sorted2.txt

Output:

text
alpha
beta
delta
epsilon
gamma
zeta
bash
# Rank word frequency across multiple files
cat *.txt | tr '[:upper:]' '[:lower:]' | tr -sc '[:alpha:]' '\n' \
  | sort | uniq -c | sort -rn | head -30

Output:

text
    312 the
    198 and
    174 to
    143 a
    121 of
    …
bash
# Show only lines that appear in both files
sort file1.txt > /tmp/s1; sort file2.txt > /tmp/s2
comm -12 /tmp/s1 /tmp/s2

Output:

text
banana
cherry
elderberry
bash
# Lines only in file1 (not in file2)
comm -23 <(sort file1.txt) <(sort file2.txt)

Output:

text
apple
date

comm — Compare Sorted Files

comm compares two sorted files line by line, outputting three columns.

bash
comm file1.txt file2.txt     # col1: only in f1, col2: only in f2, col3: both
comm -12 f1 f2               # only lines in BOTH (suppress cols 1 and 2)
comm -23 f1 f2               # only in f1 (suppress cols 2 and 3)
comm -13 f1 f2               # only in f2

Output (comm file1.txt file2.txt):

text
apple
		banana
			cherry
date
	fig
			grape

The idiom sort file | uniq -c | sort -rn (sort → count → sort by count descending) is one of the most useful pipelines for log analysis and data exploration. sort -rn | head -20 gives the top 20 most frequent items.