cheat sheet

sort, uniq & wc

Sort lines (numerically, by field, human-readable sizes), deduplicate with uniq, count lines/words/bytes with wc, and number lines with nl. With real-world pipeline recipes.

updated 04-24-2026

sort, uniq & wc — Counting & Ordering

What it is

sort, uniq, and wc are POSIX-standard text utilities present on every Unix and Linux system for ordering, deduplicating, and counting text data. sort reorders lines by string, numeric, or human-readable size values; uniq collapses adjacent duplicate lines (and can count occurrences); wc counts lines, words, characters, or bytes in a file or stream. These three tools are frequently composed in pipelines — sort | uniq -c | sort -rn being the classic frequency-count idiom for log analysis and data exploration.

sort

Common flags

Flag	Meaning
`-n`	Numeric sort
`-r`	Reverse order
`-k N`	Sort on field N
`-k N,M`	Sort on fields N through M
`-t SEP`	Field delimiter (default: whitespace)
`-u`	Unique — remove duplicate lines
`-f`	Case-insensitive (fold)
`-h`	Human-readable sizes (2K, 3M, 1G)
`-V`	Version sort (1.2 < 1.10)
`-R`	Random shuffle
`-s`	Stable sort (preserve order of equal lines)
`-c`	Check if already sorted; exit 1 if not
`-m`	Merge pre-sorted files (no sort step)
`-o FILE`	Write output to FILE (can be same as input)
`-z`	NUL-terminated lines

Basic sort

bash

sort file.txt              # lexicographic ascending

Output:

text

apple
banana
cherry
date
elderberry

bash

sort -r file.txt           # reverse

Output:

text

elderberry
date
cherry
banana
apple

bash

sort -n numbers.txt        # numeric

Output:

text

bash

sort -u file.txt           # unique lines only

Output:

text

apple
banana
cherry

bash

sort -h sizes.txt          # human sizes: 1K < 2M < 3G

Output:

text

512
4.0K
128K
1.5M
3.2G

Multi-key sort

bash

# Sort by field 2 numerically, then field 1 lexicographically
sort -t, -k2,2n -k1,1 data.csv

# Sort by field 3 descending, field 1 ascending
sort -k3,3rn -k1,1 data.txt

# Sort CSV by 4th column (numeric) descending
sort -t, -k4,4rn report.csv

Output (sort -t, -k4,4rn report.csv):

text

Eve,32,Boston,95000
Carol,35,New York,88000
Alice,30,New York,75000
Dave,28,Chicago,71000
Frank,25,Chicago,62000

bash

# Sort by month name (ignore leading whitespace in field)
sort -t: -k1,1 /etc/passwd      # by username

Output:

text

alice:x:1001:1001::/home/alice:/bin/bash
carol:x:1002:1002::/home/carol:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
root:x:0:0:root:/root:/bin/bash

bash

# Sort IP addresses correctly (4-field numeric)
sort -t. -k1,1n -k2,2n -k3,3n -k4,4n ips.txt

Output:

text

10.0.0.1
10.0.0.14
10.0.1.2
192.168.1.1
192.168.1.100

Sort by partial field

bash

# -k START.CHAR,END.CHAR
sort -k1.3,1.5 file     # characters 3–5 of field 1

Output: (none — exits 0 on success)

In-place sort

bash

sort -o file.txt file.txt    # overwrite in-place
sort file.txt | sponge file.txt  # with moreutils

Output: (none — exits 0 on success)

uniq

uniq collapses consecutive duplicate lines. Input must be sorted first.

Common flags

Flag	Meaning
`-c`	Prefix each line with occurrence count
`-d`	Print only duplicate lines (once each)
`-D`	Print all copies of duplicate lines
`-u`	Print only unique (non-repeated) lines
`-i`	Case-insensitive comparison
`-f N`	Skip first N fields
`-s N`	Skip first N characters
`-w N`	Compare only first N characters

bash

sort file.txt | uniq           # deduplicate

Output:

text

apple
banana
cherry

bash

sort file.txt | uniq -c        # count occurrences

Output:

text

      3 apple
      1 banana
      2 cherry
      4 date

bash

sort file.txt | uniq -cd       # count + only duplicates

Output:

text

      3 apple
      2 cherry
      4 date

bash

sort file.txt | uniq -u        # lines appearing exactly once

Output:

text

banana

Frequency table pattern

bash

# Most common words in a file
tr -s '[:space:]' '\n' < file.txt | sort | uniq -c | sort -rn | head -20

Output:

text

     42 the
     31 a
     28 to
     19 and
     17 of
     12 in
      9 for
      8 is
      7 with
      6 that

bash

# Most frequent IPs in access log
awk '{print $1}' access.log | sort | uniq -c | sort -rn | head -10

Output:

text

    412 203.0.113.42
    287 10.0.1.5
    194 198.51.100.7
     88 192.168.1.101
     43 203.0.113.99

bash

# Most common HTTP status codes
awk '{print $9}' access.log | sort | uniq -c | sort -rn

Output:

text

wc — Word Count

Flag	Counts
`-l`	Lines
`-w`	Words
`-c`	Bytes
`-m`	Characters (multibyte-aware)
`-L`	Length of longest line

bash

wc -l file.txt            # line count

Output:

text

      142 file.txt

bash

wc -w file.txt            # word count

Output:

text

     1024 file.txt

bash

wc -c file.txt            # byte count

Output:

text

     6891 file.txt

bash

wc file.txt               # lines + words + bytes

Output:

text

  142  1024  6891 file.txt

bash

wc -l *.log               # count per file + total

Output:

text

    312 access.log
   1047 app.log
     89 error.log
   1448 total

bash

# Count matching lines
grep -c "ERROR" app.log

Output:

text

bash

# Count files in a directory
ls | wc -l

Output:

text

bash

# Length of longest line (useful for column-width decisions)
wc -L report.txt

Output:

text

     120 report.txt

nl — Number Lines

nl prefixes each line with a right-justified line number, defaulting to numbering only non-empty lines. It is POSIX-standard and more configurable than the cat -n shorthand — you can control the numbering style, starting value, field width, and separator character.

bash

nl file.txt               # number non-empty lines (default)

Output:

text

     1	apple
     2	banana

     3	cherry
     4	date

bash

nl -b a file.txt          # number all lines including empty

Output:

text

     1	apple
     2	banana
     3	
     4	cherry
     5	date

bash

nl -n rz file.txt         # right-justified, zero-padded (000001)

Output:

text

000001	apple
000002	banana
000003	cherry

bash

nl -v 0 file.txt          # start numbering at 0
nl -s '. ' file.txt       # custom separator after number
nl -n ln file.txt         # left-justified
nl -w 3 file.txt          # width of line number field

Output: (none — exits 0 on success)

Practical pipelines

bash

# Top 10 largest files in a directory tree
du -sh * 2>/dev/null | sort -rh | head -10

Output:

text

3.2G	backups
1.5M	data
512K	logs
128K	src
 48K	config

bash

# Count unique visitors in access log (by IP)
awk '{print $1}' access.log | sort -u | wc -l

Output:

text

bash

# Distribution of response sizes
awk '{print $10}' access.log | grep -v '-' | sort -n | uniq -c

Output:

text

bash

# Find the 5 most recently modified files
ls -lt | grep '^-' | head -5

Output:

text

-rw-r--r-- 1 alice staff  4096 Apr 24 14:22 report.csv
-rw-r--r-- 1 alice staff  1280 Apr 24 13:01 config.yaml
-rwxr-xr-x 1 alice staff   512 Apr 24 11:45 deploy.sh
-rw-r--r-- 1 alice staff 65536 Apr 23 22:10 data.parquet
-rw-r--r-- 1 alice staff  2048 Apr 23 19:33 README.md

bash

# Sort a CSV by 3rd column (numeric), keep header
{ head -1 data.csv; tail -n +2 data.csv | sort -t, -k3,3n; }

Output:

text

name,age,city,salary
Frank,25,Chicago,62000
Dave,28,Chicago,71000
Alice,30,New York,75000
Eve,32,Boston,95000
Carol,35,New York,88000

bash

# Check if a file is already sorted
sort -c file.txt && echo "sorted" || echo "not sorted"

Output:

text

not sorted

bash

# Merge two pre-sorted files
sort -m sorted1.txt sorted2.txt

Output:

text

alpha
beta
delta
epsilon
gamma
zeta

bash

# Rank word frequency across multiple files
cat *.txt | tr '[:upper:]' '[:lower:]' | tr -sc '[:alpha:]' '\n' \
  | sort | uniq -c | sort -rn | head -30

Output:

text

bash

# Show only lines that appear in both files
sort file1.txt > /tmp/s1; sort file2.txt > /tmp/s2
comm -12 /tmp/s1 /tmp/s2

Output:

text

banana
cherry
elderberry

bash

# Lines only in file1 (not in file2)
comm -23 <(sort file1.txt) <(sort file2.txt)

Output:

text

apple
date

comm — Compare Sorted Files

comm compares two sorted files line by line, outputting three columns.

bash

comm file1.txt file2.txt     # col1: only in f1, col2: only in f2, col3: both
comm -12 f1 f2               # only lines in BOTH (suppress cols 1 and 2)
comm -23 f1 f2               # only in f1 (suppress cols 2 and 3)
comm -13 f1 f2               # only in f2

Output (comm file1.txt file2.txt):

text

apple
		banana
			cherry
date
	fig
			grape

The idiom sort file | uniq -c | sort -rn (sort → count → sort by count descending) is one of the most useful pipelines for log analysis and data exploration. sort -rn | head -20 gives the top 20 most frequent items.