cheat sheet
cut, paste & join
Extract columns by delimiter or byte position (cut), merge files column-wise (paste), and join on a common key field (join). Essential for tab/CSV/field-delimited data.
cut, paste & join — Column Tools
What it is
cut, paste, and join are POSIX standard text utilities included in every Unix and Linux system for working with column-delimited data. cut extracts specific fields or byte ranges from each line; paste merges multiple files side by side column-wise; join performs a relational inner or outer join on two sorted files that share a common key field. Reach for these tools when you need fast, dependency-free column extraction or merging in a shell pipeline; for more complex transformations, awk gives you full field-level programming.
cut
Extract specific fields or byte/character ranges from each line.
Syntax
cut OPTION... [FILE...]
Output: (none — exits 0 on success)
By delimiter field (-d / -f)
-d sets the field delimiter (a single character) and -f selects which fields to output. Fields are numbered from 1; use N-M for a range, N- for field N to end, and a comma-separated list for non-contiguous fields.
cut -d: -f1 /etc/passwd # first field (username)
Output:
root
daemon
bin
sys
man
nobody
cut -d: -f1,7 /etc/passwd # fields 1 and 7
Output:
root:/bin/bash
daemon:/usr/sbin/nologin
bin:/usr/sbin/nologin
sys:/usr/sbin/nologin
man:/usr/sbin/nologin
nobody:/usr/sbin/nologin
cut -d: -f1-3 /etc/passwd # fields 1 through 3
Output:
root:x:0
daemon:x:1
bin:x:2
sys:x:3
man:x:6
nobody:x:65534
cut -d: -f3- /etc/passwd # field 3 to end
cut -d, -f2 data.csv # CSV second column
Output:
Alice
Frank
Carol
Dave
Eve
cut -d$'\t' -f1,3 data.tsv # tab-delimited fields 1 and 3
cut -d' ' -f2- sentence.txt # all words except first
# Change output delimiter — pre-9.11 idiom (use tr or awk)
cut -d: -f1,3 /etc/passwd | tr ':' '\t'
# GNU coreutils ≥ 9.11: native -O for output delimiter
cut -d: -f1,3 -O $'\t' /etc/passwd
Output: (none — exits 0 on success)
By character position (-c)
-c selects output by character position rather than field, making it ideal for fixed-width data where columns align at known offsets. For purely ASCII input -c and -b are equivalent; they diverge only with multibyte encodings.
cut -c1 file.txt # first character of each line
cut -c1-10 file.txt # characters 1–10
Output:
The quick
Lorem ipsu
Filesystem
2026-04-24
cut -c5- file.txt # character 5 to end
cut -c1,5,10 file.txt # characters 1, 5, and 10
cut -c-80 file.txt # max 80 chars (truncate long lines)
Output: (none — exits 0 on success)
By byte position (-b)
-b selects by raw byte offset, which matters when the input contains multibyte UTF-8 characters — a single character may occupy 2–4 bytes, so -b and -c will give different results. Use -b when you need exact binary slicing of a stream.
cut -b1-4 binary.dat # bytes 1–4 (differs from -c for multibyte chars)
Output: (none — exits 0 on success)
Suppress undelimited lines
cut -d: -f1 -s /etc/passwd # -s: skip lines without the delimiter
Output: (none — exits 0 on success)
paste
Merge files horizontally (column-by-column).
Syntax
paste [OPTIONS] [FILE...]
Output: (none — exits 0 on success)
paste file1.txt file2.txt # merge side by side (tab-delimited)
paste -d, file1.txt file2.txt # use comma as delimiter
paste -d'\t' names.txt emails.txt # explicit tab
# Serial mode (-s): transpose — each file becomes one tab-joined line
paste -s file.txt
paste -s -d, file.txt # comma-separated
# Combine N columns from same file
paste - - < file.txt # 2 lines → 1 row (2 columns)
paste - - - < file.txt # 3 lines → 1 row (3 columns)
Output: (none — exits 0 on success)
Practical paste examples
# Create a CSV from two column files
paste -d, ids.txt names.txt > combined.csv
# Add line numbers to a file
seq 1 $(wc -l < file.txt) | paste -d'\t' - file.txt
# Interleave lines from two files
paste -d'\n' file1.txt file2.txt
# Recreate a CSV from a column of values
paste -s -d, values.txt
Output: (none — exits 0 on success)
join
Join lines from two files on a common key field (like SQL inner join).
Syntax
join [OPTIONS] FILE1 FILE2
Output: (none — exits 0 on success)
Both files must be sorted on the join key first.
# Join on field 1 (default)
join sorted1.txt sorted2.txt
# Join on specific fields
join -1 2 -2 1 file1.txt file2.txt # field 2 of f1, field 1 of f2
# Change output delimiter
join -t, file1.csv file2.csv
# Include unmatched lines (outer join)
join -a 1 file1.txt file2.txt # + unmatched from file1
join -a 2 file1.txt file2.txt # + unmatched from file2
join -a 1 -a 2 file1.txt file2.txt # full outer join
# Fill missing fields
join -a 1 -e 'N/A' -o 0,1.2,2.2 file1.txt file2.txt
# Suppress matched lines (anti-join)
join -v 1 file1.txt file2.txt # lines in f1 not in f2
join -v 2 file1.txt file2.txt # lines in f2 not in f1
Output: (none — exits 0 on success)
join example
# employees.txt (sorted by ID):
# 101 Alice
# 102 Frank
# 103 Carol
# salaries.txt (sorted by ID):
# 101 75000
# 102 82000
# 104 91000
join employees.txt salaries.txt
# 101 Alice 75000
# 102 Frank 82000
join -a 1 employees.txt salaries.txt
# 101 Alice 75000
# 102 Frank 82000
# 103 Carol ← Carol has no salary record
Output: (none — exits 0 on success)
Practical pipelines
# Extract second column from CSV, remove header, sort, count unique
tail -n +2 data.csv | cut -d, -f2 | sort | uniq -c | sort -rn
# Get all usernames from /etc/passwd
cut -d: -f1 /etc/passwd | sort
Output:
bin
daemon
mail
man
nobody
root
sys
www-data
# Get the home directories of users with /bin/bash shell
grep '/bin/bash$' /etc/passwd | cut -d: -f6
# Compare two lists (IDs in file1 not in file2)
join -v 1 <(sort ids1.txt) <(sort ids2.txt)
# Build a quick lookup from key=value file
cut -d= -f1,2 config.env | tr '=' '\t'
# Transpose a whitespace-delimited matrix
# (for small matrices — use awk for larger ones)
paste $(for i in $(seq 1 $(awk '{print NF; exit}' matrix.txt)); do
echo <(cut -d' ' -f$i matrix.txt)
done)
Output: (none — exits 0 on success)
What's new in GNU coreutils 9.11 (April 2026)
Coreutils 9.11 is the first release to ship a fully multi-byte aware cut: -c now slices on logical characters in any UTF-8 locale without surprises, and three new options close long-standing gaps with BSD/macOS and BusyBox/Toybox cut. Check your version with cut --version — -w/-O/-F will return unrecognized option on coreutils ≤ 9.10 and on BSD-only systems unless they already provide their own implementation.
| Option | Meaning | Pre-9.11 workaround |
|---|---|---|
-w | Treat any run of whitespace (tabs + spaces) as the field separator | tr -s ' \t' '\t' | cut -f… |
-O STRING | Set the output delimiter (any string) | --output-delimiter=STRING (still works) or tr |
-F LIST | BSD-style alias: combines -w + -O in a single flag | awk '{print $N}' |
# -w: split on any whitespace run, no need to tr -s first
echo 'alpha beta gamma' | cut -w -f2 # → beta
# -O: short alias for --output-delimiter
cut -d: -f1,5,7 -O '|' /etc/passwd | head -2
Output:
root|root|/bin/bash
daemon|daemon|/usr/sbin/nologin
# -F: BSD/macOS shorthand for "split on whitespace, emit with this delimiter"
ps aux | cut -F ',' -f1,2,11 | head -3
Output: (none — exits 0 on success)
Multi-byte awareness. Before 9.11, cut -c on a UTF-8 stream sometimes truncated mid-codepoint when the locale wasn't C.UTF-8. From 9.11 onward, -c always counts whole characters regardless of locale; only -b still slices on raw bytes. The café/-b warnings in the section below still apply — -b is for binary or strict-ASCII data only.
The three selection modes
cut exposes exactly three mutually-exclusive selection modes, and forgetting which one you used is the most common source of confusion. Pick -f for field-delimited data (the common case), -c for fixed-width text where you count characters, and -b only when you genuinely need raw byte offsets — for example, slicing a binary blob.
| Flag | Selects by | Best for | Pitfalls |
|---|---|---|---|
-f LIST | Field number (1-based) | CSV/TSV//etc/passwd-style data | Requires -d (default delim is TAB) |
-c LIST | Character position | Fixed-width reports, terminal output | Splits multibyte chars by codepoint |
-b LIST | Byte offset | Binary or strictly ASCII fixed-width | Splits multibyte UTF-8 into broken bytes |
# These three commands look similar but behave differently
cut -f1 data.tsv # field 1 (tab is default delimiter)
cut -c1-5 data.tsv # first 5 characters of each line
cut -b1-5 data.tsv # first 5 bytes of each line
Output: (none — exits 0 on success)
-f always needs -d (or implicitly TAB)
-f without -d assumes the field delimiter is a literal tab. Spaces are not delimiters, which trips up people who try cut -f2 file.txt on space-separated input. Use tr -s ' ' '\t' first if the data is space-padded, or switch to awk whose default -F collapses runs of whitespace.
# Wrong — single-space columns look unchanged in output
echo "alpha beta gamma" | cut -f2 # prints whole line
# Right — set a space delimiter
echo "alpha beta gamma" | cut -d' ' -f2 # → beta
# Or normalise whitespace to tabs first
echo "alpha beta gamma" | tr -s ' ' '\t' | cut -f2 # → beta
Output: (none — exits 0 on success)
-d only accepts a single character
cut's -d flag takes exactly one byte — no multi-character delimiters, no regex, no escape sequences beyond what the shell evaluates. This is the single biggest reason people migrate from cut to awk -F: a delimiter like :: or , (comma-space) can't be expressed natively.
# Wrong — cut will use only the first character
cut -d'::' -f1 file # treated as cut -d':' (warning on some systems)
# Right alternatives
awk -F'::' '{print $1}' file # awk handles multi-char
awk -F', *' '{print $1}' file # regex delimiter
sed -E 's/, +/\t/g' file | cut -f1 # normalise then cut
Output: (none — exits 0 on success)
Multi-byte characters: -c vs -b
In a UTF-8 locale, -c operates on logical characters (codepoints) while -b operates on raw bytes. A character like é occupies one codepoint but two bytes, so cut -c1 returns the full é while cut -b1 returns only its leading byte — which is not a valid character on its own. This is why -b should be reserved for binary or strictly-ASCII data.
echo 'café' | cut -c1-3 # → caf (three characters)
echo 'café' | cut -c1-4 # → café (four characters)
echo 'café' | cut -b1-3 # → caf (three bytes — still ASCII)
echo 'café' | cut -b1-5 # → café (the 'é' takes bytes 4–5)
echo 'café' | cut -b1-4 # → caf? (truncates 'é' mid-byte)
Output: (none — exits 0 on success)
If you see mojibake (
éinstead ofé) incut -boutput, you have truncated a multi-byte character. Switch to-cor setLC_ALL=C.UTF-8and use-c.
--complement
--complement inverts the field/character selection so you keep everything except the listed fields. This is faster to type than enumerating the columns you want when only one or two need to be dropped.
# Drop the password hash field from /etc/shadow (field 2)
cut -d: -f2 --complement /etc/passwd | head -3
Output:
root:0:0:root:/root:/bin/bash
daemon:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:2:2:bin:/bin:/usr/sbin/nologin
# Drop the first column of a CSV
cut -d, -f1 --complement data.csv > data_no_id.csv
# Keep all characters except the first 8 (e.g. strip a timestamp prefix)
cut -c1-8 --complement log.txt
Output: (none — exits 0 on success)
--output-delimiter
Without --output-delimiter, cut reuses the input delimiter on output, which is fine for round-tripping but means you can't simultaneously split on : and emit tabs. --output-delimiter=STRING accepts any string (not limited to one character), so it doubles as a quick way to convert delimiters.
# Convert :-delimited /etc/passwd fields to TSV
cut -d: -f1,5,7 --output-delimiter=$'\t' /etc/passwd | head -3
Output:
root root /bin/bash
daemon daemon /usr/sbin/nologin
bin bin /usr/sbin/nologin
# Multi-character output delimiter
cut -d, -f1,2,3 --output-delimiter=' | ' data.csv | head -3
Output:
id | name | role
1 | Alice | admin
2 | Frank | user
-s — suppress lines without the delimiter
By default, when -f is used, lines that do not contain the delimiter are passed through unchanged (which is rarely what you want). -s discards those lines silently, which is essential when grepping a log file where some lines are headers and others are field-delimited entries.
# A mixed file: header text + CSV rows
printf '%s\n' '# Report generated 2026-04-24' 'id,name,role' '1,Alice,admin' '2,Frank,user' > mixed.csv
cut -d, -f2 mixed.csv # prints all lines (header is unchanged)
cut -d, -f2 -s mixed.csv # skips lines without ',' — keeps CSV rows only
Output (cut -d, -f2 -s mixed.csv):
name
Alice
Frank
Collapsing repeated delimiters
cut treats every delimiter character as a separate boundary, so two consecutive delimiters produce an empty field. This is unlike awk's default whitespace-splitting, which collapses runs. To get awk-like behaviour with cut, pre-process with tr -s.
# A line with runs of spaces
echo 'alpha beta gamma' | cut -d' ' -f2 # → '' (empty field!)
# Collapse runs of spaces to single spaces first
echo 'alpha beta gamma' | tr -s ' ' | cut -d' ' -f2 # → beta
# Or just use awk
echo 'alpha beta gamma' | awk '{print $2}' # → beta
Output: (none — exits 0 on success)
cut vs awk: when to use which
cut is faster, smaller, and pipeline-safe for simple delimiter splits. awk is the right tool the moment you need any of: multi-character delimiters, regex delimiters, runs-of-whitespace, field reordering, conditional row filtering, or arithmetic. As a rule of thumb: if the problem fits on a postcard with cut, use cut; otherwise reach for awk.
| Need | cut | awk |
|---|---|---|
| Extract column 1 of a CSV | cut -d, -f1 | awk -F, '{print $1}' |
| Reorder columns | not possible | awk -F, '{print $3,$1}' |
| Multi-char delimiter | not possible | awk -F'::' |
| Collapse whitespace | needs tr -s first | default behaviour |
| Filter rows by value | needs grep first | awk -F, '$3>100' |
| Change output delimiter | --output-delimiter | `BEGIN{OFS=" |
| Sum a column | cut … | paste -sd+ | bc | awk '{s+=$2} END{print s}' |
| Speed on giant files | faster | slightly slower |
# Same task three ways — pick the shortest that does the job
cut -d, -f1,3 data.csv # cut: simple slice
awk -F, '{print $1,$3}' data.csv # awk: same, space-joined
awk -F, -v OFS=, '{print $1,$3}' data.csv # awk: keep CSV format
Output: (none — exits 0 on success)
Pairing cut with paste
cut … | paste is the canonical "extract two columns, then recombine them with a new delimiter" pattern. Process substitution (<(cmd)) lets paste read multiple cut pipelines in parallel without temp files.
# Build a new TSV from columns 1 and 7 of /etc/passwd
paste <(cut -d: -f1 /etc/passwd) <(cut -d: -f7 /etc/passwd) | head -3
Output:
root /bin/bash
daemon /usr/sbin/nologin
bin /usr/sbin/nologin
# Swap columns 1 and 2 (cut can't reorder, but paste can)
paste <(cut -d, -f2 data.csv) <(cut -d, -f1 data.csv) | tr '\t' ','
Output: (none — exits 0 on success)
Recipes
# 1. Extract a single CSV column safely (assumes no embedded commas)
cut -d, -f2 data.csv
# 2. Get the home directory of every user in /etc/passwd
cut -d: -f1,6 /etc/passwd
# alice:/home/alice
# carol:/home/carol
# 3. Build a `users.txt` from /etc/passwd field 1
cut -d: -f1 /etc/passwd | sort -u > users.txt
# 4. Drop the trailing newline character from each line
cut -c1-$(( $(awk '{print length; exit}' file) - 1 )) file
# 5. Strip an N-character prefix from every line (e.g. log timestamps)
cut -c25- access.log
# 6. Extract the first word of every line
cut -d' ' -f1 sentences.txt
# 7. Pull host names from an SSH config
grep -E '^Host ' ~/.ssh/config | cut -d' ' -f2-
# 8. Get just the PID column from ps
ps aux | tr -s ' ' | cut -d' ' -f2
# 9. Strip protocol from URLs
cut -d/ -f3 urls.txt # https://example.com/path → example.com
# 10. Recombine after editing — split, mutate, paste back
paste -d, \
<(cut -d, -f1 data.csv) \
<(cut -d, -f2 data.csv | tr '[:lower:]' '[:upper:]') \
<(cut -d, -f3- data.csv)
Output (recipe 2):
root:/root
daemon:/usr/sbin
bin:/bin
alice:/home/alice
carol:/home/carol
CSV caveats
cut is not a CSV parser — it splits naively on every comma, so embedded commas inside quoted fields ("Smith, John") will be torn apart. For correctness with real-world CSV, use a dedicated tool such as qsv, csvkit's csvcut, or awk with a CSV-aware library. cut is only safe on TSV or "well-behaved" comma-separated input.
# DANGER — embedded commas destroy field alignment
echo '1,"Smith, John",admin' | cut -d, -f2 # → "Smith (broken)
# Safe with qsv
echo '1,"Smith, John",admin' | qsv select 2 # → Smith, John
# Safe with awk's FPAT for quoted CSV
echo '1,"Smith, John",admin' | awk 'BEGIN{FPAT="([^,]+)|(\"[^\"]+\")"} {print $2}'
Output: (none — exits 0 on success)
cutalways outputs fields in the order they appear in the input, regardless of the order specified with-f. To reorder fields, useawk '{print $3, $1}'instead.
joinrequires sorted input. Usejoin <(sort f1) <(sort f2)or pre-sort withsort -k1when sorting on a non-first field.
cut -d': ' -f2does not work as a multi-char delimiter — only the first byte (:) is used. Useawk -F': 'or pre-substitute withsed 's/: /\t/g'thencut -f2.