cheat sheet

cut, paste & join

Extract columns by delimiter or byte position (cut), merge files column-wise (paste), and join on a common key field (join). Essential for tab/CSV/field-delimited data.

cut, paste & join — Column Tools

What it is

cut, paste, and join are POSIX standard text utilities included in every Unix and Linux system for working with column-delimited data. cut extracts specific fields or byte ranges from each line; paste merges multiple files side by side column-wise; join performs a relational inner or outer join on two sorted files that share a common key field. Reach for these tools when you need fast, dependency-free column extraction or merging in a shell pipeline; for more complex transformations, awk gives you full field-level programming.

cut

Extract specific fields or byte/character ranges from each line.

Syntax

bash
cut OPTION... [FILE...]

Output: (none — exits 0 on success)

By delimiter field (-d / -f)

-d sets the field delimiter (a single character) and -f selects which fields to output. Fields are numbered from 1; use N-M for a range, N- for field N to end, and a comma-separated list for non-contiguous fields.

bash
cut -d: -f1           /etc/passwd    # first field (username)

Output:

python
root
daemon
bin
sys
man
nobody
bash
cut -d: -f1,7         /etc/passwd    # fields 1 and 7

Output:

bash
root:/bin/bash
daemon:/usr/sbin/nologin
bin:/usr/sbin/nologin
sys:/usr/sbin/nologin
man:/usr/sbin/nologin
nobody:/usr/sbin/nologin
bash
cut -d: -f1-3         /etc/passwd    # fields 1 through 3

Output:

less
root:x:0
daemon:x:1
bin:x:2
sys:x:3
man:x:6
nobody:x:65534
bash
cut -d: -f3-          /etc/passwd    # field 3 to end
cut -d, -f2           data.csv       # CSV second column

Output:

code
Alice
Frank
Carol
Dave
Eve
bash
cut -d$'\t' -f1,3     data.tsv       # tab-delimited fields 1 and 3
cut -d' ' -f2-        sentence.txt   # all words except first

# Change output delimiter — pre-9.11 idiom (use tr or awk)
cut -d: -f1,3 /etc/passwd | tr ':' '\t'

# GNU coreutils ≥ 9.11: native -O for output delimiter
cut -d: -f1,3 -O $'\t' /etc/passwd

Output: (none — exits 0 on success)

By character position (-c)

-c selects output by character position rather than field, making it ideal for fixed-width data where columns align at known offsets. For purely ASCII input -c and -b are equivalent; they diverge only with multibyte encodings.

bash
cut -c1       file.txt    # first character of each line
cut -c1-10    file.txt    # characters 1–10

Output:

yaml
The quick
Lorem ipsu
Filesystem
2026-04-24
bash
cut -c5-      file.txt    # character 5 to end
cut -c1,5,10  file.txt    # characters 1, 5, and 10
cut -c-80     file.txt    # max 80 chars (truncate long lines)

Output: (none — exits 0 on success)

By byte position (-b)

-b selects by raw byte offset, which matters when the input contains multibyte UTF-8 characters — a single character may occupy 2–4 bytes, so -b and -c will give different results. Use -b when you need exact binary slicing of a stream.

bash
cut -b1-4   binary.dat   # bytes 1–4 (differs from -c for multibyte chars)

Output: (none — exits 0 on success)

Suppress undelimited lines

bash
cut -d: -f1 -s /etc/passwd   # -s: skip lines without the delimiter

Output: (none — exits 0 on success)

paste

Merge files horizontally (column-by-column).

Syntax

bash
paste [OPTIONS] [FILE...]

Output: (none — exits 0 on success)

bash
paste file1.txt file2.txt            # merge side by side (tab-delimited)
paste -d, file1.txt file2.txt        # use comma as delimiter
paste -d'\t' names.txt emails.txt    # explicit tab

# Serial mode (-s): transpose — each file becomes one tab-joined line
paste -s file.txt
paste -s -d, file.txt                # comma-separated

# Combine N columns from same file
paste - - < file.txt             # 2 lines → 1 row (2 columns)
paste - - - < file.txt           # 3 lines → 1 row (3 columns)

Output: (none — exits 0 on success)

Practical paste examples

bash
# Create a CSV from two column files
paste -d, ids.txt names.txt > combined.csv

# Add line numbers to a file
seq 1 $(wc -l < file.txt) | paste -d'\t' - file.txt

# Interleave lines from two files
paste -d'\n' file1.txt file2.txt

# Recreate a CSV from a column of values
paste -s -d, values.txt

Output: (none — exits 0 on success)

join

Join lines from two files on a common key field (like SQL inner join).

Syntax

bash
join [OPTIONS] FILE1 FILE2

Output: (none — exits 0 on success)

Both files must be sorted on the join key first.

bash
# Join on field 1 (default)
join sorted1.txt sorted2.txt

# Join on specific fields
join -1 2 -2 1 file1.txt file2.txt   # field 2 of f1, field 1 of f2

# Change output delimiter
join -t, file1.csv file2.csv

# Include unmatched lines (outer join)
join -a 1 file1.txt file2.txt        # + unmatched from file1
join -a 2 file1.txt file2.txt        # + unmatched from file2
join -a 1 -a 2 file1.txt file2.txt   # full outer join

# Fill missing fields
join -a 1 -e 'N/A' -o 0,1.2,2.2 file1.txt file2.txt

# Suppress matched lines (anti-join)
join -v 1 file1.txt file2.txt    # lines in f1 not in f2
join -v 2 file1.txt file2.txt    # lines in f2 not in f1

Output: (none — exits 0 on success)

join example

bash
# employees.txt (sorted by ID):
# 101 Alice
# 102 Frank
# 103 Carol

# salaries.txt (sorted by ID):
# 101 75000
# 102 82000
# 104 91000

join employees.txt salaries.txt
# 101 Alice 75000
# 102 Frank 82000

join -a 1 employees.txt salaries.txt
# 101 Alice 75000
# 102 Frank 82000
# 103 Carol           ← Carol has no salary record

Output: (none — exits 0 on success)

Practical pipelines

bash
# Extract second column from CSV, remove header, sort, count unique
tail -n +2 data.csv | cut -d, -f2 | sort | uniq -c | sort -rn

# Get all usernames from /etc/passwd
cut -d: -f1 /etc/passwd | sort

Output:

kotlin
bin
daemon
mail
man
nobody
root
sys
www-data
bash

# Get the home directories of users with /bin/bash shell
grep '/bin/bash$' /etc/passwd | cut -d: -f6

# Compare two lists (IDs in file1 not in file2)
join -v 1 <(sort ids1.txt) <(sort ids2.txt)

# Build a quick lookup from key=value file
cut -d= -f1,2 config.env | tr '=' '\t'

# Transpose a whitespace-delimited matrix
# (for small matrices — use awk for larger ones)
paste $(for i in $(seq 1 $(awk '{print NF; exit}' matrix.txt)); do
  echo <(cut -d' ' -f$i matrix.txt)
done)

Output: (none — exits 0 on success)

What's new in GNU coreutils 9.11 (April 2026)

Coreutils 9.11 is the first release to ship a fully multi-byte aware cut: -c now slices on logical characters in any UTF-8 locale without surprises, and three new options close long-standing gaps with BSD/macOS and BusyBox/Toybox cut. Check your version with cut --version-w/-O/-F will return unrecognized option on coreutils ≤ 9.10 and on BSD-only systems unless they already provide their own implementation.

OptionMeaningPre-9.11 workaround
-wTreat any run of whitespace (tabs + spaces) as the field separatortr -s ' \t' '\t' | cut -f…
-O STRINGSet the output delimiter (any string)--output-delimiter=STRING (still works) or tr
-F LISTBSD-style alias: combines -w + -O in a single flagawk '{print $N}'
bash
# -w: split on any whitespace run, no need to tr -s first
echo 'alpha   beta   gamma' | cut -w -f2          # → beta

# -O: short alias for --output-delimiter
cut -d: -f1,5,7 -O '|' /etc/passwd | head -2

Output:

bash
root|root|/bin/bash
daemon|daemon|/usr/sbin/nologin
bash
# -F: BSD/macOS shorthand for "split on whitespace, emit with this delimiter"
ps aux | cut -F ',' -f1,2,11 | head -3

Output: (none — exits 0 on success)

Multi-byte awareness. Before 9.11, cut -c on a UTF-8 stream sometimes truncated mid-codepoint when the locale wasn't C.UTF-8. From 9.11 onward, -c always counts whole characters regardless of locale; only -b still slices on raw bytes. The café/-b warnings in the section below still apply — -b is for binary or strict-ASCII data only.

The three selection modes

cut exposes exactly three mutually-exclusive selection modes, and forgetting which one you used is the most common source of confusion. Pick -f for field-delimited data (the common case), -c for fixed-width text where you count characters, and -b only when you genuinely need raw byte offsets — for example, slicing a binary blob.

FlagSelects byBest forPitfalls
-f LISTField number (1-based)CSV/TSV//etc/passwd-style dataRequires -d (default delim is TAB)
-c LISTCharacter positionFixed-width reports, terminal outputSplits multibyte chars by codepoint
-b LISTByte offsetBinary or strictly ASCII fixed-widthSplits multibyte UTF-8 into broken bytes
bash
# These three commands look similar but behave differently
cut -f1   data.tsv      # field 1 (tab is default delimiter)
cut -c1-5 data.tsv      # first 5 characters of each line
cut -b1-5 data.tsv      # first 5 bytes of each line

Output: (none — exits 0 on success)

-f always needs -d (or implicitly TAB)

-f without -d assumes the field delimiter is a literal tab. Spaces are not delimiters, which trips up people who try cut -f2 file.txt on space-separated input. Use tr -s ' ' '\t' first if the data is space-padded, or switch to awk whose default -F collapses runs of whitespace.

bash
# Wrong — single-space columns look unchanged in output
echo "alpha beta gamma" | cut -f2     # prints whole line

# Right — set a space delimiter
echo "alpha beta gamma" | cut -d' ' -f2    # → beta

# Or normalise whitespace to tabs first
echo "alpha   beta   gamma" | tr -s ' ' '\t' | cut -f2    # → beta

Output: (none — exits 0 on success)

-d only accepts a single character

cut's -d flag takes exactly one byte — no multi-character delimiters, no regex, no escape sequences beyond what the shell evaluates. This is the single biggest reason people migrate from cut to awk -F: a delimiter like :: or , (comma-space) can't be expressed natively.

bash
# Wrong — cut will use only the first character
cut -d'::' -f1 file        # treated as cut -d':' (warning on some systems)

# Right alternatives
awk -F'::' '{print $1}' file              # awk handles multi-char
awk -F', *' '{print $1}' file              # regex delimiter
sed -E 's/, +/\t/g' file | cut -f1         # normalise then cut

Output: (none — exits 0 on success)

Multi-byte characters: -c vs -b

In a UTF-8 locale, -c operates on logical characters (codepoints) while -b operates on raw bytes. A character like é occupies one codepoint but two bytes, so cut -c1 returns the full é while cut -b1 returns only its leading byte — which is not a valid character on its own. This is why -b should be reserved for binary or strictly-ASCII data.

bash
echo 'café' | cut -c1-3         # → caf  (three characters)
echo 'café' | cut -c1-4         # → café (four characters)
echo 'café' | cut -b1-3         # → caf  (three bytes — still ASCII)
echo 'café' | cut -b1-5         # → café (the 'é' takes bytes 4–5)
echo 'café' | cut -b1-4         # → caf? (truncates 'é' mid-byte)

Output: (none — exits 0 on success)

If you see mojibake (é instead of é) in cut -b output, you have truncated a multi-byte character. Switch to -c or set LC_ALL=C.UTF-8 and use -c.

--complement

--complement inverts the field/character selection so you keep everything except the listed fields. This is faster to type than enumerating the columns you want when only one or two need to be dropped.

bash
# Drop the password hash field from /etc/shadow (field 2)
cut -d: -f2 --complement /etc/passwd | head -3

Output:

ruby
root:0:0:root:/root:/bin/bash
daemon:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:2:2:bin:/bin:/usr/sbin/nologin
bash
# Drop the first column of a CSV
cut -d, -f1 --complement data.csv > data_no_id.csv

# Keep all characters except the first 8 (e.g. strip a timestamp prefix)
cut -c1-8 --complement log.txt

Output: (none — exits 0 on success)

--output-delimiter

Without --output-delimiter, cut reuses the input delimiter on output, which is fine for round-tripping but means you can't simultaneously split on : and emit tabs. --output-delimiter=STRING accepts any string (not limited to one character), so it doubles as a quick way to convert delimiters.

bash
# Convert :-delimited /etc/passwd fields to TSV
cut -d: -f1,5,7 --output-delimiter=$'\t' /etc/passwd | head -3

Output:

bash
root	root	/bin/bash
daemon	daemon	/usr/sbin/nologin
bin	bin	/usr/sbin/nologin
bash
# Multi-character output delimiter
cut -d, -f1,2,3 --output-delimiter=' | ' data.csv | head -3

Output:

bash
id | name | role
1 | Alice | admin
2 | Frank | user

-s — suppress lines without the delimiter

By default, when -f is used, lines that do not contain the delimiter are passed through unchanged (which is rarely what you want). -s discards those lines silently, which is essential when grepping a log file where some lines are headers and others are field-delimited entries.

bash
# A mixed file: header text + CSV rows
printf '%s\n' '# Report generated 2026-04-24' 'id,name,role' '1,Alice,admin' '2,Frank,user' > mixed.csv

cut -d, -f2 mixed.csv          # prints all lines (header is unchanged)
cut -d, -f2 -s mixed.csv       # skips lines without ',' — keeps CSV rows only

Output (cut -d, -f2 -s mixed.csv):

code
name
Alice
Frank

Collapsing repeated delimiters

cut treats every delimiter character as a separate boundary, so two consecutive delimiters produce an empty field. This is unlike awk's default whitespace-splitting, which collapses runs. To get awk-like behaviour with cut, pre-process with tr -s.

bash
# A line with runs of spaces
echo 'alpha   beta   gamma' | cut -d' ' -f2     # → '' (empty field!)

# Collapse runs of spaces to single spaces first
echo 'alpha   beta   gamma' | tr -s ' ' | cut -d' ' -f2   # → beta

# Or just use awk
echo 'alpha   beta   gamma' | awk '{print $2}'    # → beta

Output: (none — exits 0 on success)

cut vs awk: when to use which

cut is faster, smaller, and pipeline-safe for simple delimiter splits. awk is the right tool the moment you need any of: multi-character delimiters, regex delimiters, runs-of-whitespace, field reordering, conditional row filtering, or arithmetic. As a rule of thumb: if the problem fits on a postcard with cut, use cut; otherwise reach for awk.

Needcutawk
Extract column 1 of a CSVcut -d, -f1awk -F, '{print $1}'
Reorder columnsnot possibleawk -F, '{print $3,$1}'
Multi-char delimiternot possibleawk -F'::'
Collapse whitespaceneeds tr -s firstdefault behaviour
Filter rows by valueneeds grep firstawk -F, '$3>100'
Change output delimiter--output-delimiter`BEGIN{OFS="
Sum a columncut … | paste -sd+ | bcawk '{s+=$2} END{print s}'
Speed on giant filesfasterslightly slower
bash
# Same task three ways — pick the shortest that does the job
cut -d, -f1,3 data.csv                       # cut: simple slice
awk -F, '{print $1,$3}' data.csv             # awk: same, space-joined
awk -F, -v OFS=, '{print $1,$3}' data.csv    # awk: keep CSV format

Output: (none — exits 0 on success)

Pairing cut with paste

cut … | paste is the canonical "extract two columns, then recombine them with a new delimiter" pattern. Process substitution (<(cmd)) lets paste read multiple cut pipelines in parallel without temp files.

bash
# Build a new TSV from columns 1 and 7 of /etc/passwd
paste <(cut -d: -f1 /etc/passwd) <(cut -d: -f7 /etc/passwd) | head -3

Output:

bash
root	/bin/bash
daemon	/usr/sbin/nologin
bin	/usr/sbin/nologin
bash
# Swap columns 1 and 2 (cut can't reorder, but paste can)
paste <(cut -d, -f2 data.csv) <(cut -d, -f1 data.csv) | tr '\t' ','

Output: (none — exits 0 on success)

Recipes

bash
# 1. Extract a single CSV column safely (assumes no embedded commas)
cut -d, -f2 data.csv

# 2. Get the home directory of every user in /etc/passwd
cut -d: -f1,6 /etc/passwd
#   alice:/home/alice
#   carol:/home/carol

# 3. Build a `users.txt` from /etc/passwd field 1
cut -d: -f1 /etc/passwd | sort -u > users.txt

# 4. Drop the trailing newline character from each line
cut -c1-$(( $(awk '{print length; exit}' file) - 1 )) file

# 5. Strip an N-character prefix from every line (e.g. log timestamps)
cut -c25- access.log

# 6. Extract the first word of every line
cut -d' ' -f1 sentences.txt

# 7. Pull host names from an SSH config
grep -E '^Host ' ~/.ssh/config | cut -d' ' -f2-

# 8. Get just the PID column from ps
ps aux | tr -s ' ' | cut -d' ' -f2

# 9. Strip protocol from URLs
cut -d/ -f3 urls.txt          # https://example.com/path → example.com

# 10. Recombine after editing — split, mutate, paste back
paste -d, \
  <(cut -d, -f1 data.csv) \
  <(cut -d, -f2 data.csv | tr '[:lower:]' '[:upper:]') \
  <(cut -d, -f3- data.csv)

Output (recipe 2):

bash
root:/root
daemon:/usr/sbin
bin:/bin
alice:/home/alice
carol:/home/carol

CSV caveats

cut is not a CSV parser — it splits naively on every comma, so embedded commas inside quoted fields ("Smith, John") will be torn apart. For correctness with real-world CSV, use a dedicated tool such as qsv, csvkit's csvcut, or awk with a CSV-aware library. cut is only safe on TSV or "well-behaved" comma-separated input.

bash
# DANGER — embedded commas destroy field alignment
echo '1,"Smith, John",admin' | cut -d, -f2   # → "Smith   (broken)

# Safe with qsv
echo '1,"Smith, John",admin' | qsv select 2  # → Smith, John

# Safe with awk's FPAT for quoted CSV
echo '1,"Smith, John",admin' | awk 'BEGIN{FPAT="([^,]+)|(\"[^\"]+\")"} {print $2}'

Output: (none — exits 0 on success)

cut always outputs fields in the order they appear in the input, regardless of the order specified with -f. To reorder fields, use awk '{print $3, $1}' instead.

join requires sorted input. Use join <(sort f1) <(sort f2) or pre-sort with sort -k1 when sorting on a non-first field.

cut -d': ' -f2 does not work as a multi-char delimiter — only the first byte (:) is used. Use awk -F': ' or pre-substitute with sed 's/: /\t/g' then cut -f2.

Sources