cheat sheet

grep

Search files and streams using fixed strings, extended regex, or PCRE. Covers all major flags, context, recursive search, and pipeline patterns.

grep — Pattern Search

What it is

grep is a POSIX-standard command-line utility for searching plain-text input for lines that match a regular expression or fixed string, present on every Unix and Linux system since its creation by Ken Thompson at Bell Labs in 1974. It supports basic regex (BRE), extended regex (grep -E / egrep), and Perl-compatible regex (grep -P), and can search recursively through directory trees. Reach for grep for quick pattern matching in files or pipelines; for large codebases where speed matters, ripgrep is significantly faster and respects .gitignore by default.

Syntax

The pattern is a regular expression by default (BRE); use -E for extended regex or -F for a plain literal string. When no file is given, grep reads from stdin, making it natural at the end of a pipeline.

bash
grep [OPTIONS] PATTERN [FILE...]
grep [OPTIONS] -e PATTERN -e PATTERN [FILE...]
grep [OPTIONS] -f PATTERNFILE [FILE...]

Output: (none — exits 0 on success)

Essential flags

FlagMeaning
-iCase-insensitive match
-vInvert match (non-matching lines)
-nShow line numbers
-cPrint match count per file
-lPrint only filenames with matches
-LPrint only filenames with no match
-r / -RRecursive (follow symlinks with -R)
-wMatch whole word only
-xMatch whole line only
-oPrint only the matching part of the line
-qQuiet — exit 0 if match found, no output
-sSuppress error messages about missing files
-m NStop after N matches
-hSuppress filename prefix (multi-file mode)
-HAlways print filename prefix

Regex flavours

FlagEngineNotes
grepBRE (basic)\+ | \( \) need backslash
grep -E / egrepERE (extended)+ | () unescaped
grep -PPCRE\d \w (?=...) lookaheads etc.
grep -F / fgrepFixed stringNo regex, fastest for literals

Context lines

-A, -B, and -C print additional lines surrounding each match, which is essential for reading log files where the error message alone lacks context. Groups of context lines are separated by -- when multiple matches appear.

bash
grep -A 3 "ERROR" app.log      # 3 lines After match
grep -B 2 "ERROR" app.log      # 2 lines Before match
grep -C 5 "ERROR" app.log      # 5 lines before and after (Context)

Output:

text
2026-04-24 14:03:11 INFO  Request received
2026-04-24 14:03:11 WARN  Retry attempt 3
2026-04-24 14:03:12 ERROR Connection timed out
2026-04-24 14:03:12 INFO  Closing socket
2026-04-24 14:03:12 INFO  Cleanup complete

Character classes & anchors

^ and $ anchor a pattern to the start or end of a line; [...] matches any character in the set; [^...] negates it. POSIX classes like [:digit:] and [:alpha:] are portable alternatives to \d and \w, which require -P (PCRE) on most systems.

bash
grep '^root'         /etc/passwd   # lines starting with "root"
grep 'bash$'         /etc/passwd   # lines ending with "bash"
grep '^$'            file.txt      # empty lines
grep -E '[0-9]{4}'   data.txt      # exactly 4 digits
grep -P '\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b' ips.txt  # IPv4 addresses

Output:

text
root:x:0:0:root:/root:/bin/bash

Multiple patterns

-e adds an additional pattern to match (logical OR); -f loads patterns from a file, one per line. These are more readable than embedding alternation (error|warn) inside the pattern string, especially when the list is long or dynamically generated.

bash
grep -e "error" -e "warn" -e "crit" /var/log/syslog

# Or from a file (one pattern per line)
grep -f patterns.txt logfile

Output:

text
Apr 24 09:01:33 host kernel: [12345.678] warn: disk space low
Apr 24 09:02:11 host app[4321]: error: failed to open config
Apr 24 09:05:44 host app[4321]: crit: service unresponsive

-r descends into subdirectories; -R is the same but follows symbolic links. Use --include and --exclude to filter by filename glob, and --exclude-dir to skip directories like .git or node_modules. For large codebases, ripgrep is faster and skips ignored files by default.

bash
grep -r "TODO" ./src/
grep -rl "console.log" ./src/           # filenames only
grep -rn --include="*.py" "import" .    # only .py files
grep -r --exclude="*.min.js" "fetch" .  # skip minified files
grep -r --exclude-dir={.git,node_modules,dist} "pattern" .

Output (grep -r "TODO" ./src/):

text
./src/auth/login.py:42:    # TODO: add rate limiting
./src/api/routes.py:118:   # TODO: validate input schema
./src/utils/cache.py:7:    # TODO: implement expiry

Output (grep -rl "console.log" ./src/):

text
./src/app.js
./src/utils/logger.js
./src/components/Debug.jsx

Output (grep -rn --include="*.py" "import" .):

text
./src/auth/login.py:1:import os
./src/auth/login.py:2:import hashlib
./src/api/routes.py:1:import flask
./src/api/routes.py:2:import json

Colour and output control

--color=always forces ANSI color codes even when stdout is piped — necessary when you want color preserved through less -R. -o prints only the matched portion of each line rather than the whole line, which is useful for extracting values from structured text.

bash
grep --color=always "error" log | less -R   # keep colour through pipe
grep -o '"[^"]*"' file.json                 # extract all quoted strings
grep -oP '(?<=href=")[^"]*' page.html       # PCRE lookbehind — all href values
grep -n "" file.txt                          # number every line (cat -n alternative)

Output (grep -o '"[^"]*"' file.json):

text
"name"
"Alice"
"role"
"admin"
"active"

Output (grep -oP '(?<=href=")[^"]*' page.html):

text
/about
/contact
https://docs.example.com
/assets/style.css

Count and statistics

-c prints the number of matching lines per file rather than the lines themselves — faster than piping to wc -l and gives per-file counts when multiple files are searched. -m N stops after N matches, which can cut processing time on large files when you only need to know whether a pattern exists.

bash
grep -c "ERROR" app.log                 # match count in one file
grep -rc "TODO" src/ | sort -t: -k2 -rn  # per-file TODO counts, sorted

# Number of unique matching lines
grep "pattern" file | sort -u | wc -l

Output (grep -c "ERROR" app.log):

text
47

Output (grep -rc "TODO" src/ | sort -t: -k2 -rn):

text
src/api/routes.py:12
src/auth/login.py:7
src/utils/cache.py:4
src/models/user.py:2
src/main.py:1

Exit codes

CodeMeaning
0At least one match found
1No match found
2Error (bad option, missing file)
bash
# Use in conditionals
if grep -q "FAILED" build.log; then
  echo "Build had failures"
fi

# Check without output
grep -qs "pattern" file && echo "found"

Output: (none — exits 0 on success)

Binary files

By default grep suppresses output and prints a warning when it detects binary content. -a / --text forces grep to treat the file as text, which can produce garbled output but finds embedded strings. -I is the opposite: silently skip binary files, useful in recursive searches over mixed directories.

bash
grep -a "pattern" binary.bin    # treat binary as text
grep -I "pattern" *             # skip binary files silently
grep --binary-files=text "str" img.bin

Output: (none — exits 0 on success)

Practical pipelines

bash
# Find processes by name
ps aux | grep -v grep | grep nginx

Output:

text
www-data  1234  0.0  0.5  12340  5120 ?  Ss  09:00  0:00 nginx: master process
www-data  1235  0.0  0.3   9876  3456 ?  S   09:00  0:00 nginx: worker process
bash
# Extract unique IPs from access log
grep -oP '\d+\.\d+\.\d+\.\d+' access.log | sort -u

Output:

text
10.0.0.5
10.0.1.14
192.168.1.101
203.0.113.42
bash
# Show lines between two patterns (inclusive)
grep -A 9999 "START" file | grep -B 9999 "END"

# Find files containing all of two patterns
grep -rl "alpha" . | xargs grep -l "beta"

Output:

text
./config/feature-flags.yaml
./src/algorithm.py
bash
# Lines that DON'T contain either of two words
grep -v -e "debug" -e "trace" app.log

Output:

text
2026-04-24 10:00:01 INFO  Server started on :8080
2026-04-24 10:01:22 WARN  High memory usage: 87%
2026-04-24 10:03:45 ERROR DB connection lost
bash
# Highlight matches but show all lines
grep --color=always -E "error|$" app.log

Output: (none — exits 0 on success)

BRE vs ERE vs PCRE in practice

grep ships with three distinct regex engines and the syntax differences trip up almost everyone. Basic Regex (BRE) is the default — most metacharacters (+, ?, |, (, ), {) must be backslash-escaped to act as metacharacters. Extended Regex (ERE, via -E) removes the escaping, which is what people typically expect from "modern" regex. Perl-Compatible Regex (PCRE, via -P) adds shorthand classes (\d, \w, \s), non-greedy quantifiers (*?, +?), and lookarounds.

bash
# Same pattern in all three flavours: words "cat" or "dog"
grep    'cat\|dog'   file   # BRE — literal |, escape ()
grep -E 'cat|dog'    file   # ERE — | is metacharacter
grep -P 'cat|dog'    file   # PCRE — same as ERE here

# Quantifiers in BRE require escaping
grep    'colou\?r'   file   # BRE — \?
grep -E 'colou?r'    file   # ERE
grep -P 'colou?r'    file   # PCRE

# Repetition counts
grep    '[0-9]\{4\}' file   # BRE — \{ \}
grep -E '[0-9]{4}'   file   # ERE
grep -P '\d{4}'      file   # PCRE shorthand

# Non-greedy is PCRE-only
grep -P '<.*?>'      file   # match shortest <...> on each line

# Lookarounds are PCRE-only
grep -P '(?<=Bearer\s)\S+' headers.txt   # token after "Bearer "
grep -P '\bTODO\b(?!:)'    src/*.py      # TODO not followed by colon

Output (grep -P '(?<=Bearer\s)\S+' headers.txt):

text
eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJhbGljZSJ9.abc123

Word and line boundaries

Word boundaries match the empty position between a word character ([A-Za-z0-9_]) and a non-word character, which is how you scope a search to a whole word without false positives like cat inside concatenate. GNU grep supports \b in BRE/ERE plus \< (start-of-word) and \> (end-of-word) as a finer-grained pair. PCRE adds \A (start of input) and \Z (end of input) for multi-line buffers, though grep operates line-by-line by default so they behave like ^ and $.

Since GNU grep 3.11 (May 2023), -P shorthand classes \w, \d, \s and the \b boundary use ASCII-only semantics by default (a reversion from 3.8). To get Unicode behaviour under PCRE you must opt in explicitly with (*UCP) at the start of the pattern, e.g. grep -P '(*UCP)\w+' file. POSIX classes like [[:alpha:]] remain locale-aware and are the portable alternative.

bash
grep -w 'cat' file              # shorthand for \bcat\b
grep '\bcat\b' file              # word boundaries
grep '\<cat\>' file              # GNU-only start/end-of-word
grep -E '\b(error|fail|abort)\b' app.log   # whole-word alternation
grep -P '\bTODO\b' src/         # PCRE word boundary

Output (grep -w 'cat' file):

text
The cat sat on the mat
A black cat crossed the road

POSIX classes vs PCRE classes

POSIX character classes like [[:digit:]], [[:alpha:]], [[:space:]], and [[:alnum:]] are portable across BRE and ERE and respect the current locale (so [[:alpha:]] matches accented letters when LANG is set). PCRE shorthand (\d, \w, \s) is shorter but ASCII-only by default and only available with -P.

POSIXPCREMatches
[[:digit:]]\d0–9
[[:alpha:]]letters (locale-aware)
[[:alnum:]]\w minus _letters + digits
[[:space:]]\swhitespace
[[:upper:]]uppercase letters
[[:lower:]]lowercase letters
[[:xdigit:]]hex digits
[[:punct:]]punctuation
[[:cntrl:]]control chars
[[:print:]]printable chars
[^[:digit:]]\Dnegated digit
bash
grep -E '^[[:alpha:]]+$' words.txt              # only letter lines
grep -E '[[:xdigit:]]{6,8}'    colors.css       # hex colors
grep -E '[^[:print:]]'         file             # find unprintable chars
grep -P '\w+@\w+\.\w+'        contacts.txt     # rough email match

Output (grep -E '[[:xdigit:]]{6,8}' colors.css):

text
  --primary: #8a5cff;
  --bg:      #0a0a0f;
  --accent:  #ff5cb3;

Include, exclude, and recursive defaults

--include and --exclude accept shell globs and apply to the basename of each candidate file. --exclude-dir skips entire directories without descending into them — much faster than relying on --exclude for the contents. Globs may be repeated; multiple --include patterns are OR-ed together.

bash
# Only search source files in a polyglot repo
grep -r --include='*.py' --include='*.ts' --include='*.go' 'TODO' .

# Skip generated, vendored, and VCS directories
grep -r --exclude-dir={.git,node_modules,dist,build,vendor,.venv,__pycache__} \
  'pattern' .

# Common per-language combos
grep -r --include='*.{js,ts,jsx,tsx}' 'console\.' src/
grep -r --include='Dockerfile*' --include='*.dockerfile' 'FROM' .

Output: (none — exits 0 on success)

Both -r and -R recurse into directories. -r (lowercase) treats symbolic links to directories as regular files and does not descend into them — the safer default. -R (uppercase) dereferences every symlink, which is what you want when your source tree intentionally links into shared modules but a risk when a stray symlink points to /proc or your home directory.

bash
grep -r  'pattern' .   # skip symlinked dirs (safe default)
grep -R  'pattern' .   # follow symlinks (may loop or scan too much)
grep -rL 'pattern' .   # files with NO match (uppercase L)

Output: (none — exits 0 on success)

GNU grep detects symlink cycles and refuses to loop forever, but BSD grep older than 2017 did not. Prefer -r unless you have a specific reason to traverse links.

Null-data and binary-safe pipelines

grep --null (alias -Z) prints filenames terminated by a NUL byte instead of a newline, which lets file paths containing spaces or newlines be passed safely to xargs -0 for further processing. -z (lowercase) changes the input record separator from newline to NUL — useful for matching multi-line patterns by treating the whole input as a single record.

bash
# Safely pass matched files into another command
grep -rlZ 'TODO' src/ | xargs -0 wc -l

# Open every matching file in your editor (zsh/bash)
grep -rlZ 'TODO' src/ | xargs -0 -I{} $EDITOR {}

# Treat the whole file as one record (multi-line regex)
grep -Pzo '(?s)BEGIN.*?END' transcript.txt

# Combine: NUL-delimited input AND NUL-delimited output
find . -name '*.log' -print0 | xargs -0 grep -lZ 'panic' | xargs -0 ls -lh

Output (grep -Pzo '(?s)BEGIN.*?END' transcript.txt):

text
BEGIN
Alice: hello
Bob: hi back
END

Customizing colors with GREP_COLORS

When --color=auto (the default on most distros) is active, grep emits ANSI escapes controlled by the GREP_COLORS environment variable. The string is a colon-separated list of name=SGR pairs. Once set in your shell rc, every interactive grep picks it up.

NameWhat it colors
msMatched text in a selected line
mcMatched text in a context line
slWhole selected line
cxWhole context line
fnFilename prefix
lnLine-number prefix
bnByte-offset prefix
seSeparator (: and --)
bash
# High-contrast green-on-black matches, dim filenames
export GREP_COLORS='ms=01;32:fn=02;37:ln=02;36:se=00;30'

# Force colour on through a pager
grep --color=always 'error' app.log | less -R

# Disable colour entirely
grep --color=never 'pattern' file

Output: (none — exits 0 on success)

Performance and engine notes

For literal strings always prefer grep -F (Aho-Corasick under the hood on GNU grep), which is dramatically faster than the regex engine. For huge files, grep historically supported --mmap to map the input into memory; this flag is now ignored on modern GNU grep because the default I/O path is already optimal. Set LC_ALL=C to disable locale-aware byte interpretation when you only need ASCII matching — this can speed up multi-byte locales by 2–10×.

bash
# Force ASCII byte matching for speed
LC_ALL=C grep -F 'literal-string' huge.log

# PCRE with JIT (when compiled in) — usually on by default on Debian/Ubuntu
grep -P --version 2>&1 | grep -i 'pcre'

# Compare engines on a 1 GB log
time LC_ALL=C grep -F 'panic'    huge.log
time LC_ALL=C grep -E 'panic|fail' huge.log
time LC_ALL=C grep -P '\bpanic\b' huge.log

Output: (none — exits 0 on success)

For codebase-wide searches (many small files), ripgrep is typically 5–20× faster because of parallel directory traversal and a SIMD-accelerated regex engine — cross-link: see the ripgrep page in this section. For richer features (Boolean AND/OR/NOT queries, fuzzy matching, hexdumps, and searching inside archives and PDFs) without giving up grep-compatible flags, ugrep is the other modern drop-in replacement worth knowing — see https://github.com/Genivia/ugrep. For structural code search that understands ASTs rather than text, ast-grep complements (rather than replaces) the line-oriented tools.

Workflow recipes

bash
# 1. Triage today's errors in a busy log
grep -E "$(date +%F).*ERROR" -A 3 app.log | less -R

# 2. Build a TODO census across a repo
grep -rnE --exclude-dir={.git,node_modules,dist} \
  '(TODO|FIXME|HACK|XXX)' . | sort -t: -k1,1 -k2,2n

Output (grep -rnE ... TODO census):

text
src/api/routes.py:118:    # FIXME: validate input schema
src/auth/login.py:42:    # TODO: add rate limiting
src/utils/cache.py:7:    # TODO: implement expiry
tests/test_user.py:201:  # HACK: workaround for upstream bug #4421
bash
# 3. Find-then-replace pipeline (preview first, then act)
grep -rlZ --include='*.py' 'old_api_url' . | xargs -0 sed -i 's|old_api_url|new_api_url|g'

# 4. Cross-reference: filenames containing BOTH patterns
comm -12 <(grep -rl 'use_async' src/ | sort) \
         <(grep -rl 'await ' src/ | sort)

# 5. Smallest match per line (non-greedy via PCRE)
grep -oP 'href="\K[^"]+' page.html

# 6. Approximate "lines around a marker, including the marker" via context
grep -nC 5 '^## Section 3$' README.md

Output (grep -oP 'href="\K[^"]+' page.html):

text
/about
/contact
https://docs.example.com
/assets/style.css

Common pitfalls

  • grep regex * expands * first: if no files match the glob, the literal * becomes the search pattern. Quote globs or pass a directory with -r.
  • Egrep alternation across newlines: grep is line-oriented; alternation never spans newlines without -z. Use -z (NUL input separator) or pre-process with tr '\n' '\0'.
  • Backslash in single vs double quotes: in shell double quotes, \b is consumed by the shell. Prefer single quotes for any pattern containing backslashes.
  • grep -v "x" | grep -v "y": only excludes lines that have both x and y. To exclude lines with either, use grep -Ev 'x|y'.
  • Case-insensitive with non-ASCII: grep -i honors the current locale. LC_ALL=C grep -i 'café' will not match CAFÉ.
  • Color codes leak into pipes: piping grep --color=auto is fine (it disables color when stdout isn't a TTY), but --color=always injects escapes that confuse awk, sort, wc, etc.

Tips

Use grep -F for literal string searches (no regex overhead) — significantly faster on large files when you don't need pattern matching.

grep -P (PCRE) is not available on BSD/macOS by default. Use ggrep -P (via Homebrew grep) or switch to ripgrep which has PCRE2 built-in.

Set LC_ALL=C in your shell for pure-ASCII pipelines — it disables Unicode locale handling and dramatically speeds up grep, sort, tr, and awk on large files.

[!WARN] --color=always writes ANSI escape codes into the stream — never combine it with tools that parse the output. Use --color=auto (the default) or --color=never for pipelines.

Sources