cheat sheet

awk / gawk

Pattern-action language for structured text. Field splitting, built-in variables, arithmetic, string functions, arrays, BEGIN/END blocks, and practical data-processing recipes.

updated 05-25-2026

awk / gawk — Text Processing

What it is

awk is a pattern-action text processing language that has been part of Unix since 1977, originally created by Aho, Weinberger, and Kernighan at Bell Labs; gawk is the GNU implementation and the most widely installed variant today. It automatically splits each input line into fields, making it ideal for processing structured text like log files, CSV, and command output without writing a full script. Reach for awk when you need to filter, transform, or aggregate field-delimited text in a pipeline; for full programming logic or JSON, python or jq are better choices.

Syntax

An awk program is a series of pattern { action } pairs written as a single-quoted string on the command line or stored in a file passed with -f. Options like -F (field separator) and -v (variable assignment) come before the program.

bash

awk [OPTIONS] 'PROGRAM' [FILE...]
awk [OPTIONS] -f script.awk [FILE...]
awk -v VAR=value 'PROGRAM' [FILE...]

Output: (none — exits 0 on success)

A program is a series of pattern { action } rules. Both are optional:

No pattern → action runs on every line
No action → default is { print } (prints the matching line)

Built-in variables

Variable	Meaning
`$0`	Entire current record (line)
`$1` … `$NF`	Fields 1 through NF
`NF`	Number of fields in current record
`NR`	Total records read so far
`FNR`	Record number within current file
`FS`	Input field separator (default: whitespace)
`OFS`	Output field separator (default: space)
`RS`	Input record separator (default: `\n`)
`ORS`	Output record separator (default: `\n`)
`FILENAME`	Current input file name
`ARGC` / `ARGV`	Argument count / array

BEGIN and END blocks

BEGIN runs once before any input is read — the right place to initialize variables or set FS/OFS. END runs once after the last record, making it ideal for printing totals or summaries. Neither block receives input records.

awk

BEGIN { FS=","; OFS="\t" }   # run before any input
{ print $2, $1 }              # run per record
END   { print "Total:", NR }  # run after all input

Given data.csv containing name,dept,salary rows:

Output:

text

Engineering	Alice
Ops	Bob
Finance	Carol
Total: 3

Field separator

FS controls how each input record is split into fields ($1, $2, …). Set it with -F on the command line or by assigning FS in a BEGIN block; it can be a literal character, a multi-character string, or a regex. The default splits on runs of whitespace, discarding leading and trailing spaces.

bash

awk -F: '{print $1}' /etc/passwd          # colon-separated
awk -F'\t' '{print $3}' data.tsv          # tab-separated
awk -F', *' '{print $2}' file             # comma + optional spaces
awk 'BEGIN{FS="|"} {print $1}' pipe.txt   # pipe character
awk -F'[,;]' '{print $1, $2}' file        # regex separator

Output:

text

root
daemon
bin
sys
nobody

Patterns

A pattern is a condition that gates whether a rule's action runs on a given record. It can be a regex (/re/), a comparison expression ($3 > 100), or a range (/START/,/END/). Omitting the pattern means the action runs on every record; omitting the action defaults to { print }.

bash

awk '/error/'             log      # print lines matching regex
awk '!/^#/'              config    # skip comment lines
awk 'NR==1'              file      # first line only
awk 'NR>=10 && NR<=20'  file      # lines 10–20
awk '$3 > 100'           data      # field comparison
awk '$1 ~ /^foo/'        file      # field matches regex
awk '/START/,/END/'      file      # range: START to END (inclusive)

Output:

text

2026-01-15 10:23:45 ERROR connection refused
2026-01-15 10:45:01 ERROR timeout waiting for response

Printf and output

printf gives column-aligned, formatted output using C-style format strings — use it instead of print when you need fixed-width fields or numeric precision. Awk can also redirect output directly to files or pipe it into shell commands without leaving the awk process.

bash

awk '{printf "%-20s %5d\n", $1, $2}' file   # formatted output
awk '{print $2 > "out.txt"}'         file   # redirect to file
awk '{print $1 >> "append.txt"}'     file   # append
awk '{print | "sort -rn"}'           file   # pipe to command

Output:

text

Alice                75000
Bob                  62000
Carol                91000

String functions

Function	Description
`length(s)`	Length of string (or `$0` if no arg)
`substr(s, i, n)`	Substring from index i (1-based), length n
`index(s, t)`	First position of t in s (0 = not found)
`split(s, a, sep)`	Split s into array a using sep
`sub(r, s, t)`	Replace first regex r match in t with s
`gsub(r, s, t)`	Replace all regex r matches in t with s
`match(s, r)`	Sets RSTART, RLENGTH; returns position or 0
`sprintf(fmt, ...)`	Format string (like printf, returns string)
`tolower(s)`	Lowercase
`toupper(s)`	Uppercase
`gensub(r, s, h, t)`	gawk: replace with `\1` groups, h="g" for global

bash

awk '{print toupper($1), length($0)}' file
awk '{gsub(/foo/, "bar"); print}'     file        # replace in $0
awk '{sub(/^[ \t]+/, ""); print}'     file        # ltrim
awk '{gsub(/[ \t]+$/, ""); print}'    file        # rtrim
awk 'match($0, /[0-9]+/) {print substr($0, RSTART, RLENGTH)}' file

Output:

text

ALICE 28
BOB 22
CAROL 30

Numeric functions (gawk)

Gawk provides standard math functions including int(), sqrt(), sin(), cos(), atan2(), log(), exp(), and rand()/srand() for random numbers. Basic arithmetic operators (+, -, *, /, %, ^) and printf format specifiers handle most numeric output needs.

bash

awk '{print int($1), sqrt($2), $3^2}' data
awk 'BEGIN{srand()} {print int(rand()*100)}' /dev/stdin
awk '{printf "%.2f\n", $1/$2}' nums

Output:

text

3 4.000 25.000000
7 9.000 144.000000

Arrays

Awk arrays are associative (hash maps): the index can be any string or number, and entries spring into existence on first use. They are unordered — iterate with for (key in array) — and a single array can accumulate counts, sums, or mappings across all input records.

bash

# Frequency count
awk '{count[$1]++} END {for (k in count) print k, count[k]}' file

# Associative array from CSV: id → name
awk -F, 'NR>1 {map[$1]=$2} END {for (id in map) print id, map[id]}' data.csv

# Delete element
awk '{delete seen[$1]; seen[$1]=$2}' file

# Array test
awk '$1 in seen {print "dup:", $1} {seen[$1]=1}' file

Output:

text

Engineering 3
Ops 1
Finance 2

Multi-file processing

When multiple files are passed, NR counts records across all files while FNR resets to 1 for each new file. The classic two-file idiom FNR==NR { … ; next } processes the first file into memory, then uses that data while reading the second.

bash

# FNR vs NR
awk 'FNR==1 {print "--- File:", FILENAME}' file1 file2

# Process only second file
awk 'FNR==NR {ids[$1]=1; next} $1 in ids' list.txt data.txt

Output:

text

--- File: file1
--- File: file2

Practical recipes

bash

# Sum a column
awk '{sum+=$3} END {print sum}' data.txt

# Average
awk '{sum+=$1; n++} END {print sum/n}' numbers.txt

# Print columns in different order
awk '{print $3, $1, $2}' file.txt

# Skip header, process rest
awk 'NR>1 {print $2, $4}' report.csv

# Print unique lines (ordered, like sort|uniq)
awk '!seen[$0]++' file.txt

# Print duplicate lines only
awk 'seen[$0]++ == 1' file.txt

# Concatenate lines every N records
awk 'ORS= (NR%3 ? " " : "\n")' file    # join every 3 lines

# Extract value from key=value
awk -F= '/^timeout/{print $2}' config.ini

# Column-align a colon file
awk -F: '{printf "%-15s %-10s %s\n", $1,$3,$7}' /etc/passwd

# Top N by field
awk '{print $5, $0}' access.log | sort -rn | head -10 | cut -d' ' -f2-

# Running total
awk '{running+=$1; print running, $0}' ledger.txt

# Filter by date field (YYYY-MM-DD in $2)
awk '$2 >= "2025-01-01" && $2 <= "2025-03-31"' events.log

# Accumulate by group, then report
awk -F, '{bytes[$1]+=$3} END {
  for (h in bytes) printf "%s\t%.1f MB\n", h, bytes[h]/1048576
}' access.csv | sort -k2 -rn

# Transpose rows to columns
awk '{for(i=1;i<=NF;i++) col[i]=col[i] (NR>1?"\t":"") $i}
     END {for(i=1;i<=NF;i++) print col[i]}' matrix.txt

Output:

text

228000
76000.00
75000 Alice Engineering
62000 Bob Ops
Engineering Alice
Ops Bob
10 2026-01-15 10:23:45 ERROR connection refused
10 10.00 MB
webserver01	15.2 MB
appserver02	8.7 MB
dbserver03	3.1 MB

Multiline records

Setting RS="" switches awk into paragraph mode, where blank lines delimit records and newlines within a record become field separators (when FS="\n"). For CSV with quoted newlines, gawk's FPAT variable matches field content by pattern rather than splitting on a delimiter.

bash

# Blank-line-separated records (like paragraphs)
awk 'BEGIN{RS=""; FS="\n"} /keyword/{print $1}' file

# Multi-line CSV (quoted fields containing newlines) — use gawk
gawk 'BEGIN{FPAT="([^,]*)|(\"[^\"]+\")"} {print $2}' data.csv

Output:

text

First line of matching paragraph
"Engineering Department, North"

Built-in CSV (gawk 5.3+)

Gawk 5.3.0 (Nov 2023) added native CSV parsing via the --csv option, mirroring the same feature in BWK awk ("The One True Awk"). It correctly handles quoted fields, embedded commas, doubled "" quotes, and CRLF line endings — no more hand-rolled FPAT for standard RFC 4180 data. The mode forces FS="," and disables backslash-escape processing inside fields; combine it with BEGIN{OFS=","} to round-trip CSV.

bash

# Parse a CSV with quoted commas and embedded quotes (gawk 5.3+)
gawk --csv '{print $1, $3}' data.csv

# Convert CSV to TSV
gawk --csv 'BEGIN{OFS="\t"} {$1=$1; print}' data.csv > data.tsv

# Skip header, sum a numeric column
gawk --csv 'NR>1 {sum+=$4} END {print sum}' sales.csv

Output:

text

Alice "Engineering, North"
Bob "Ops, West"
Carol "Finance, HQ"

Check with gawk --version — --csv requires gawk 5.3.0 or later. On older systems, fall back to the FPAT recipe above or use a dedicated tool like qsv.

Unicode escapes (gawk 5.3+)

Gawk 5.3.0 also introduced the \u escape sequence for inserting Unicode code points by hex value (1–8 digits), encoded as UTF-8 in the current locale. This makes it easier to emit non-ASCII symbols, box-drawing characters, and emoji from awk programs without literal multibyte bytes in source.

bash

# Print a checkmark and warning symbol (gawk 5.3+)
gawk 'BEGIN{print "✓ ok"; print "⚠ warn"}'

# Box-drawing borders
gawk 'BEGIN{print "┌──┐"; print "└──┘"}'

Output:

text

✓ ok
⚠ warn
┌──┐
└──┘

Modern alternatives

If you outgrow standard awk for performance or modern I/O, two actively used reimplementations expand the design space. frawk is a Rust-based JIT-compiled awk-alike with statically inferred types and built-in CSV/TSV support — typically several times faster than gawk on large files. goawk is a POSIX-compliant Go implementation with CSV mode (-i csv / -o csv), useful when you want a single static binary or are embedding awk in a Go program. Both accept most awk programs unchanged but differ in extensions and edge cases.

bash

# frawk: same syntax, often faster on big files
frawk -F, '{sum+=$3} END {print sum}' huge.csv

# goawk: CSV input/output modes
goawk -i csv -o csv 'NR>1 {print $1, $4}' data.csv

Output: (none — performance/portability differences only)

One-liners reference

bash

awk 'END{print NR}' file              # count lines (wc -l)
awk '{print NF}'   file              # print field count per line
awk 'NF'           file              # remove blank lines
awk 'length>72'    file              # lines longer than 72 chars
awk '{$1=$1; print}' file            # collapse whitespace, trim
awk '{print $NF}'  file              # print last field
awk '{print $(NF-1)}' file           # print second-to-last field
awk 'NR%2==0'      file              # print even-numbered lines
awk 'NR==FNR{a[$0];next} $0 in a'  f1 f2  # intersection of two files
awk 'NR==FNR{a[$0];next} !($0 in a)' f1 f2 # lines in f2 not in f1

Output:

text

42
3
4
5
The quick brown fox jumps over the lazy dog — this line exceeds seventy-two characters
/bin/bash
/bin/sh
line2
line4
alice
carol
bob
dave

gawk (GNU awk) extends POSIX awk with gensub(), FPAT for CSV, nextfile, co-processes (|&), and more. On most Linux systems awk is already gawk; check with awk --version.

Sources

Gawk 5.3.0 released — LWN.net — CSV and \u Unicode escape additions.
Gawk 5.3.2 announcement (info-gnu, April 2025) — latest stable bug-fix release.
The GNU Awk User's Guide — canonical reference for gawk extensions.
frawk on GitHub — Rust-based JIT awk alternative with CSV support.