cheat sheet
grep
Search files and streams using fixed strings, extended regex, or PCRE. Covers all major flags, context, recursive search, and pipeline patterns.
grep — Pattern Search
What it is
grep is a POSIX-standard command-line utility for searching plain-text input for lines that match a regular expression or fixed string, present on every Unix and Linux system since its creation by Ken Thompson at Bell Labs in 1974. It supports basic regex (BRE), extended regex (grep -E / egrep), and Perl-compatible regex (grep -P), and can search recursively through directory trees. Reach for grep for quick pattern matching in files or pipelines; for large codebases where speed matters, ripgrep is significantly faster and respects .gitignore by default.
Syntax
The pattern is a regular expression by default (BRE); use -E for extended regex or -F for a plain literal string. When no file is given, grep reads from stdin, making it natural at the end of a pipeline.
grep [OPTIONS] PATTERN [FILE...]
grep [OPTIONS] -e PATTERN -e PATTERN [FILE...]
grep [OPTIONS] -f PATTERNFILE [FILE...]
Output: (none — exits 0 on success)
Essential flags
| Flag | Meaning |
|---|---|
-i | Case-insensitive match |
-v | Invert match (non-matching lines) |
-n | Show line numbers |
-c | Print match count per file |
-l | Print only filenames with matches |
-L | Print only filenames with no match |
-r / -R | Recursive (follow symlinks with -R) |
-w | Match whole word only |
-x | Match whole line only |
-o | Print only the matching part of the line |
-q | Quiet — exit 0 if match found, no output |
-s | Suppress error messages about missing files |
-m N | Stop after N matches |
-h | Suppress filename prefix (multi-file mode) |
-H | Always print filename prefix |
Regex flavours
| Flag | Engine | Notes |
|---|---|---|
grep | BRE (basic) | \+ | \( \) need backslash |
grep -E / egrep | ERE (extended) | + | () unescaped |
grep -P | PCRE | \d \w (?=...) lookaheads etc. |
grep -F / fgrep | Fixed string | No regex, fastest for literals |
Context lines
-A, -B, and -C print additional lines surrounding each match, which is essential for reading log files where the error message alone lacks context. Groups of context lines are separated by -- when multiple matches appear.
grep -A 3 "ERROR" app.log # 3 lines After match
grep -B 2 "ERROR" app.log # 2 lines Before match
grep -C 5 "ERROR" app.log # 5 lines before and after (Context)
Output:
2026-04-24 14:03:11 INFO Request received
2026-04-24 14:03:11 WARN Retry attempt 3
2026-04-24 14:03:12 ERROR Connection timed out
2026-04-24 14:03:12 INFO Closing socket
2026-04-24 14:03:12 INFO Cleanup complete
Character classes & anchors
^ and $ anchor a pattern to the start or end of a line; [...] matches any character in the set; [^...] negates it. POSIX classes like [:digit:] and [:alpha:] are portable alternatives to \d and \w, which require -P (PCRE) on most systems.
grep '^root' /etc/passwd # lines starting with "root"
grep 'bash$' /etc/passwd # lines ending with "bash"
grep '^$' file.txt # empty lines
grep -E '[0-9]{4}' data.txt # exactly 4 digits
grep -P '\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b' ips.txt # IPv4 addresses
Output:
root:x:0:0:root:/root:/bin/bash
Multiple patterns
-e adds an additional pattern to match (logical OR); -f loads patterns from a file, one per line. These are more readable than embedding alternation (error|warn) inside the pattern string, especially when the list is long or dynamically generated.
grep -e "error" -e "warn" -e "crit" /var/log/syslog
# Or from a file (one pattern per line)
grep -f patterns.txt logfile
Output:
Apr 24 09:01:33 host kernel: [12345.678] warn: disk space low
Apr 24 09:02:11 host app[4321]: error: failed to open config
Apr 24 09:05:44 host app[4321]: crit: service unresponsive
Recursive search
-r descends into subdirectories; -R is the same but follows symbolic links. Use --include and --exclude to filter by filename glob, and --exclude-dir to skip directories like .git or node_modules. For large codebases, ripgrep is faster and skips ignored files by default.
grep -r "TODO" ./src/
grep -rl "console.log" ./src/ # filenames only
grep -rn --include="*.py" "import" . # only .py files
grep -r --exclude="*.min.js" "fetch" . # skip minified files
grep -r --exclude-dir={.git,node_modules,dist} "pattern" .
Output (grep -r "TODO" ./src/):
./src/auth/login.py:42: # TODO: add rate limiting
./src/api/routes.py:118: # TODO: validate input schema
./src/utils/cache.py:7: # TODO: implement expiry
Output (grep -rl "console.log" ./src/):
./src/app.js
./src/utils/logger.js
./src/components/Debug.jsx
Output (grep -rn --include="*.py" "import" .):
./src/auth/login.py:1:import os
./src/auth/login.py:2:import hashlib
./src/api/routes.py:1:import flask
./src/api/routes.py:2:import json
Colour and output control
--color=always forces ANSI color codes even when stdout is piped — necessary when you want color preserved through less -R. -o prints only the matched portion of each line rather than the whole line, which is useful for extracting values from structured text.
grep --color=always "error" log | less -R # keep colour through pipe
grep -o '"[^"]*"' file.json # extract all quoted strings
grep -oP '(?<=href=")[^"]*' page.html # PCRE lookbehind — all href values
grep -n "" file.txt # number every line (cat -n alternative)
Output (grep -o '"[^"]*"' file.json):
"name"
"Alice"
"role"
"admin"
"active"
Output (grep -oP '(?<=href=")[^"]*' page.html):
/about
/contact
https://docs.example.com
/assets/style.css
Count and statistics
-c prints the number of matching lines per file rather than the lines themselves — faster than piping to wc -l and gives per-file counts when multiple files are searched. -m N stops after N matches, which can cut processing time on large files when you only need to know whether a pattern exists.
grep -c "ERROR" app.log # match count in one file
grep -rc "TODO" src/ | sort -t: -k2 -rn # per-file TODO counts, sorted
# Number of unique matching lines
grep "pattern" file | sort -u | wc -l
Output (grep -c "ERROR" app.log):
47
Output (grep -rc "TODO" src/ | sort -t: -k2 -rn):
src/api/routes.py:12
src/auth/login.py:7
src/utils/cache.py:4
src/models/user.py:2
src/main.py:1
Exit codes
| Code | Meaning |
|---|---|
0 | At least one match found |
1 | No match found |
2 | Error (bad option, missing file) |
# Use in conditionals
if grep -q "FAILED" build.log; then
echo "Build had failures"
fi
# Check without output
grep -qs "pattern" file && echo "found"
Output: (none — exits 0 on success)
Binary files
By default grep suppresses output and prints a warning when it detects binary content. -a / --text forces grep to treat the file as text, which can produce garbled output but finds embedded strings. -I is the opposite: silently skip binary files, useful in recursive searches over mixed directories.
grep -a "pattern" binary.bin # treat binary as text
grep -I "pattern" * # skip binary files silently
grep --binary-files=text "str" img.bin
Output: (none — exits 0 on success)
Practical pipelines
# Find processes by name
ps aux | grep -v grep | grep nginx
Output:
www-data 1234 0.0 0.5 12340 5120 ? Ss 09:00 0:00 nginx: master process
www-data 1235 0.0 0.3 9876 3456 ? S 09:00 0:00 nginx: worker process
# Extract unique IPs from access log
grep -oP '\d+\.\d+\.\d+\.\d+' access.log | sort -u
Output:
10.0.0.5
10.0.1.14
192.168.1.101
203.0.113.42
# Show lines between two patterns (inclusive)
grep -A 9999 "START" file | grep -B 9999 "END"
# Find files containing all of two patterns
grep -rl "alpha" . | xargs grep -l "beta"
Output:
./config/feature-flags.yaml
./src/algorithm.py
# Lines that DON'T contain either of two words
grep -v -e "debug" -e "trace" app.log
Output:
2026-04-24 10:00:01 INFO Server started on :8080
2026-04-24 10:01:22 WARN High memory usage: 87%
2026-04-24 10:03:45 ERROR DB connection lost
# Highlight matches but show all lines
grep --color=always -E "error|$" app.log
Output: (none — exits 0 on success)
BRE vs ERE vs PCRE in practice
grep ships with three distinct regex engines and the syntax differences trip up almost everyone. Basic Regex (BRE) is the default — most metacharacters (+, ?, |, (, ), {) must be backslash-escaped to act as metacharacters. Extended Regex (ERE, via -E) removes the escaping, which is what people typically expect from "modern" regex. Perl-Compatible Regex (PCRE, via -P) adds shorthand classes (\d, \w, \s), non-greedy quantifiers (*?, +?), and lookarounds.
# Same pattern in all three flavours: words "cat" or "dog"
grep 'cat\|dog' file # BRE — literal |, escape ()
grep -E 'cat|dog' file # ERE — | is metacharacter
grep -P 'cat|dog' file # PCRE — same as ERE here
# Quantifiers in BRE require escaping
grep 'colou\?r' file # BRE — \?
grep -E 'colou?r' file # ERE
grep -P 'colou?r' file # PCRE
# Repetition counts
grep '[0-9]\{4\}' file # BRE — \{ \}
grep -E '[0-9]{4}' file # ERE
grep -P '\d{4}' file # PCRE shorthand
# Non-greedy is PCRE-only
grep -P '<.*?>' file # match shortest <...> on each line
# Lookarounds are PCRE-only
grep -P '(?<=Bearer\s)\S+' headers.txt # token after "Bearer "
grep -P '\bTODO\b(?!:)' src/*.py # TODO not followed by colon
Output (grep -P '(?<=Bearer\s)\S+' headers.txt):
eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJhbGljZSJ9.abc123
Word and line boundaries
Word boundaries match the empty position between a word character ([A-Za-z0-9_]) and a non-word character, which is how you scope a search to a whole word without false positives like cat inside concatenate. GNU grep supports \b in BRE/ERE plus \< (start-of-word) and \> (end-of-word) as a finer-grained pair. PCRE adds \A (start of input) and \Z (end of input) for multi-line buffers, though grep operates line-by-line by default so they behave like ^ and $.
Since GNU grep 3.11 (May 2023),
-Pshorthand classes\w,\d,\sand the\bboundary use ASCII-only semantics by default (a reversion from 3.8). To get Unicode behaviour under PCRE you must opt in explicitly with(*UCP)at the start of the pattern, e.g.grep -P '(*UCP)\w+' file. POSIX classes like[[:alpha:]]remain locale-aware and are the portable alternative.
grep -w 'cat' file # shorthand for \bcat\b
grep '\bcat\b' file # word boundaries
grep '\<cat\>' file # GNU-only start/end-of-word
grep -E '\b(error|fail|abort)\b' app.log # whole-word alternation
grep -P '\bTODO\b' src/ # PCRE word boundary
Output (grep -w 'cat' file):
The cat sat on the mat
A black cat crossed the road
POSIX classes vs PCRE classes
POSIX character classes like [[:digit:]], [[:alpha:]], [[:space:]], and [[:alnum:]] are portable across BRE and ERE and respect the current locale (so [[:alpha:]] matches accented letters when LANG is set). PCRE shorthand (\d, \w, \s) is shorter but ASCII-only by default and only available with -P.
| POSIX | PCRE | Matches |
|---|---|---|
[[:digit:]] | \d | 0–9 |
[[:alpha:]] | — | letters (locale-aware) |
[[:alnum:]] | \w minus _ | letters + digits |
[[:space:]] | \s | whitespace |
[[:upper:]] | — | uppercase letters |
[[:lower:]] | — | lowercase letters |
[[:xdigit:]] | — | hex digits |
[[:punct:]] | — | punctuation |
[[:cntrl:]] | — | control chars |
[[:print:]] | — | printable chars |
[^[:digit:]] | \D | negated digit |
grep -E '^[[:alpha:]]+$' words.txt # only letter lines
grep -E '[[:xdigit:]]{6,8}' colors.css # hex colors
grep -E '[^[:print:]]' file # find unprintable chars
grep -P '\w+@\w+\.\w+' contacts.txt # rough email match
Output (grep -E '[[:xdigit:]]{6,8}' colors.css):
--primary: #8a5cff;
--bg: #0a0a0f;
--accent: #ff5cb3;
Include, exclude, and recursive defaults
--include and --exclude accept shell globs and apply to the basename of each candidate file. --exclude-dir skips entire directories without descending into them — much faster than relying on --exclude for the contents. Globs may be repeated; multiple --include patterns are OR-ed together.
# Only search source files in a polyglot repo
grep -r --include='*.py' --include='*.ts' --include='*.go' 'TODO' .
# Skip generated, vendored, and VCS directories
grep -r --exclude-dir={.git,node_modules,dist,build,vendor,.venv,__pycache__} \
'pattern' .
# Common per-language combos
grep -r --include='*.{js,ts,jsx,tsx}' 'console\.' src/
grep -r --include='Dockerfile*' --include='*.dockerfile' 'FROM' .
Output: (none — exits 0 on success)
-r vs -R and symbolic links
Both -r and -R recurse into directories. -r (lowercase) treats symbolic links to directories as regular files and does not descend into them — the safer default. -R (uppercase) dereferences every symlink, which is what you want when your source tree intentionally links into shared modules but a risk when a stray symlink points to /proc or your home directory.
grep -r 'pattern' . # skip symlinked dirs (safe default)
grep -R 'pattern' . # follow symlinks (may loop or scan too much)
grep -rL 'pattern' . # files with NO match (uppercase L)
Output: (none — exits 0 on success)
GNU grep detects symlink cycles and refuses to loop forever, but BSD grep older than 2017 did not. Prefer -r unless you have a specific reason to traverse links.
Null-data and binary-safe pipelines
grep --null (alias -Z) prints filenames terminated by a NUL byte instead of a newline, which lets file paths containing spaces or newlines be passed safely to xargs -0 for further processing. -z (lowercase) changes the input record separator from newline to NUL — useful for matching multi-line patterns by treating the whole input as a single record.
# Safely pass matched files into another command
grep -rlZ 'TODO' src/ | xargs -0 wc -l
# Open every matching file in your editor (zsh/bash)
grep -rlZ 'TODO' src/ | xargs -0 -I{} $EDITOR {}
# Treat the whole file as one record (multi-line regex)
grep -Pzo '(?s)BEGIN.*?END' transcript.txt
# Combine: NUL-delimited input AND NUL-delimited output
find . -name '*.log' -print0 | xargs -0 grep -lZ 'panic' | xargs -0 ls -lh
Output (grep -Pzo '(?s)BEGIN.*?END' transcript.txt):
BEGIN
Alice: hello
Bob: hi back
END
Customizing colors with GREP_COLORS
When --color=auto (the default on most distros) is active, grep emits ANSI escapes controlled by the GREP_COLORS environment variable. The string is a colon-separated list of name=SGR pairs. Once set in your shell rc, every interactive grep picks it up.
| Name | What it colors |
|---|---|
ms | Matched text in a selected line |
mc | Matched text in a context line |
sl | Whole selected line |
cx | Whole context line |
fn | Filename prefix |
ln | Line-number prefix |
bn | Byte-offset prefix |
se | Separator (: and --) |
# High-contrast green-on-black matches, dim filenames
export GREP_COLORS='ms=01;32:fn=02;37:ln=02;36:se=00;30'
# Force colour on through a pager
grep --color=always 'error' app.log | less -R
# Disable colour entirely
grep --color=never 'pattern' file
Output: (none — exits 0 on success)
Performance and engine notes
For literal strings always prefer grep -F (Aho-Corasick under the hood on GNU grep), which is dramatically faster than the regex engine. For huge files, grep historically supported --mmap to map the input into memory; this flag is now ignored on modern GNU grep because the default I/O path is already optimal. Set LC_ALL=C to disable locale-aware byte interpretation when you only need ASCII matching — this can speed up multi-byte locales by 2–10×.
# Force ASCII byte matching for speed
LC_ALL=C grep -F 'literal-string' huge.log
# PCRE with JIT (when compiled in) — usually on by default on Debian/Ubuntu
grep -P --version 2>&1 | grep -i 'pcre'
# Compare engines on a 1 GB log
time LC_ALL=C grep -F 'panic' huge.log
time LC_ALL=C grep -E 'panic|fail' huge.log
time LC_ALL=C grep -P '\bpanic\b' huge.log
Output: (none — exits 0 on success)
For codebase-wide searches (many small files), ripgrep is typically 5–20× faster because of parallel directory traversal and a SIMD-accelerated regex engine — cross-link: see the ripgrep page in this section. For richer features (Boolean AND/OR/NOT queries, fuzzy matching, hexdumps, and searching inside archives and PDFs) without giving up grep-compatible flags, ugrep is the other modern drop-in replacement worth knowing — see https://github.com/Genivia/ugrep. For structural code search that understands ASTs rather than text, ast-grep complements (rather than replaces) the line-oriented tools.
Workflow recipes
# 1. Triage today's errors in a busy log
grep -E "$(date +%F).*ERROR" -A 3 app.log | less -R
# 2. Build a TODO census across a repo
grep -rnE --exclude-dir={.git,node_modules,dist} \
'(TODO|FIXME|HACK|XXX)' . | sort -t: -k1,1 -k2,2n
Output (grep -rnE ... TODO census):
src/api/routes.py:118: # FIXME: validate input schema
src/auth/login.py:42: # TODO: add rate limiting
src/utils/cache.py:7: # TODO: implement expiry
tests/test_user.py:201: # HACK: workaround for upstream bug #4421
# 3. Find-then-replace pipeline (preview first, then act)
grep -rlZ --include='*.py' 'old_api_url' . | xargs -0 sed -i 's|old_api_url|new_api_url|g'
# 4. Cross-reference: filenames containing BOTH patterns
comm -12 <(grep -rl 'use_async' src/ | sort) \
<(grep -rl 'await ' src/ | sort)
# 5. Smallest match per line (non-greedy via PCRE)
grep -oP 'href="\K[^"]+' page.html
# 6. Approximate "lines around a marker, including the marker" via context
grep -nC 5 '^## Section 3$' README.md
Output (grep -oP 'href="\K[^"]+' page.html):
/about
/contact
https://docs.example.com
/assets/style.css
Common pitfalls
grep regex *expands*first: if no files match the glob, the literal*becomes the search pattern. Quote globs or pass a directory with-r.- Egrep alternation across newlines: grep is line-oriented; alternation never spans newlines without
-z. Use-z(NUL input separator) or pre-process withtr '\n' '\0'. - Backslash in single vs double quotes: in shell double quotes,
\bis consumed by the shell. Prefer single quotes for any pattern containing backslashes. grep -v "x" | grep -v "y": only excludes lines that have both x and y. To exclude lines with either, usegrep -Ev 'x|y'.- Case-insensitive with non-ASCII:
grep -ihonors the current locale.LC_ALL=C grep -i 'café'will not matchCAFÉ. - Color codes leak into pipes: piping
grep --color=autois fine (it disables color when stdout isn't a TTY), but--color=alwaysinjects escapes that confuseawk,sort,wc, etc.
Tips
Use
grep -Ffor literal string searches (no regex overhead) — significantly faster on large files when you don't need pattern matching.
grep -P(PCRE) is not available on BSD/macOS by default. Useggrep -P(via Homebrewgrep) or switch toripgrepwhich has PCRE2 built-in.
Set
LC_ALL=Cin your shell for pure-ASCII pipelines — it disables Unicode locale handling and dramatically speeds upgrep,sort,tr, andawkon large files.
[!WARN]
--color=alwayswrites ANSI escape codes into the stream — never combine it with tools that parse the output. Use--color=auto(the default) or--color=neverfor pipelines.