cheat sheet
Regular Expressions
JavaScript has built-in RegExp support (ES2018+ with named groups, lookbehind, dotAll). Covers literal syntax, flags, character classes, methods, named captures, and common patterns.
Regular Expressions
What it is
JavaScript has built-in Regular Expression support based on a PCRE-like syntax. ES2018 added named capture groups, lookbehind assertions, and the s (dotAll) flag. ES2022 added the d (indices) flag. ES2024 added the v (unicodeSets) flag. RegExp literals are compiled at parse time; new RegExp() is evaluated at runtime (useful for dynamic patterns).
Literal syntax vs RegExp constructor
// Literal — compiled at parse time; use for static patterns
const re = /hello/i;
// Constructor — evaluated at runtime; use for dynamic patterns
const word = "hello";
const re2 = new RegExp(word, "i");
// Escape special characters when building from user input
function escapeRegExp(str) {
return str.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
}
const userInput = "file.txt";
const safe = new RegExp(escapeRegExp(userInput), "g");
Flags
| Flag | Name | Effect |
|---|---|---|
g | global | Find all matches (not just first); advances lastIndex |
i | ignoreCase | Case-insensitive matching |
m | multiline | ^ and $ match start/end of each line, not the whole string |
s | dotAll | . matches newline characters too (ES2018) |
u | unicode | Full Unicode mode; enables \u{…} escapes and \p{…} properties |
v | unicodeSets | Superset of u; enables set operations [A--B], [A&&B] (ES2024) |
d | hasIndices | Adds .indices array to match results with start/end positions (ES2022) |
y | sticky | Match only at lastIndex position; does not advance past non-matches |
const str = "Hello\nWorld";
/^world/i.test(str); // false — ^ matches start of string only
/^world/im.test(str); // true — m flag makes ^ match start of line
/hello.world/s.test(str); // true — s flag allows . to match \n
Character classes and syntax
Character classes define sets of characters that a position may match. Square brackets [...] match any one character in the set; shorthand escapes like \d, \w, and \s expand to common sets; anchors and quantifiers control position and repetition.
// Character classes
/[aeiou]/ // any vowel
/[^aeiou]/ // any non-vowel
/[a-z]/ // a through z
/[a-zA-Z0-9]/ // alphanumeric
// Shorthand classes
/\d/ // digit: [0-9]
/\D/ // non-digit
/\w/ // word char: [a-zA-Z0-9_]
/\W/ // non-word char
/\s/ // whitespace (space, tab, newline, etc.)
/\S/ // non-whitespace
/./ // any char except newline (unless s flag)
// Anchors
/^start/ // start of string (or line with m flag)
/end$/ // end of string (or line with m flag)
/\bword\b/ // word boundary
/\Bword\B/ // non-word boundary
// Quantifiers
/a*/ // 0 or more
/a+/ // 1 or more
/a?/ // 0 or 1
/a{3}/ // exactly 3
/a{2,5}/ // 2 to 5
/a{2,}/ // 2 or more
// Quantifier greediness
/a+/ // greedy: matches as many as possible
/a+?/ // lazy: matches as few as possible
Groups
Parentheses group sub-expressions and, by default, capture the matched text as a numbered group. Use (?:...) to group without capturing (cheaper, no slot allocated); use (?<name>...) for named captures accessible via match.groups.
// Capturing group — captured in match result
/(foo)(bar)/.exec("foobar");
// ['foobar', 'foo', 'bar', index: 0, ...]
// Non-capturing group — groups without capture
/(?:foo)(bar)/.exec("foobar");
// ['foobar', 'bar', index: 0, ...]
// Named capturing group (ES2018)
/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/.exec("2026-04-26");
// match.groups = { year: '2026', month: '04', day: '26' }
// Backreferences to captured group
/(['"]).*?\1/.test('"quoted"'); // true — \1 refers to group 1
/(?<q>['"]).*?\k<q>/.test('"quoted"'); // named backreference
Lookahead and lookbehind
Zero-width assertions that match a position based on what precedes or follows it, without consuming characters. Lookaheads ((?=...) / (?!...)) are part of ES5; lookbehinds ((?<=...) / (?<!...)) require ES2018 or later.
// Positive lookahead — match X only if followed by Y
/\d+(?= dollars)/.exec("100 dollars"); // ['100']
// Negative lookahead — match X only if NOT followed by Y
/\d+(?! dollars)/.exec("100 euros"); // ['100']
// Positive lookbehind (ES2018) — match X only if preceded by Y
/(?<=\$)\d+/.exec("$42"); // ['42']
// Negative lookbehind (ES2018) — match X only if NOT preceded by Y
/(?<!\$)\d+/.exec("42 USD"); // ['42']
String methods with RegExp
Strings expose match, matchAll, replace, replaceAll, search, and split, all of which accept a RegExp. match / matchAll extract matches; replace / replaceAll substitute them; search returns the match index; split uses the pattern as a delimiter.
const str = "The quick brown fox";
// test() — boolean check
/quick/.test(str); // true
// match() — returns first match (no g flag) or all matches (g flag)
str.match(/\w+/); // ['The', index: 0, input: '...', groups: undefined]
str.match(/\w+/g); // ['The', 'quick', 'brown', 'fox']
// matchAll() — returns iterator of all matches WITH groups (requires g flag)
const re = /(?<word>\w+)/g;
for (const match of str.matchAll(re)) {
console.log(match.groups.word, "at", match.index);
}
// search() — returns index of first match, or -1
str.search(/fox/); // 16
// replace() — replace first match (no g) or all (g flag)
str.replace(/\b\w/, (c) => c.toUpperCase()); // Already capitalized...
"foo foo foo".replace(/foo/, "bar"); // "bar foo foo"
"foo foo foo".replace(/foo/g, "bar"); // "bar bar bar"
// replaceAll() — always replaces all (string or regex with g flag)
"foo foo foo".replaceAll("foo", "bar"); // "bar bar bar"
// split() — split on pattern
"one1two2three".split(/\d/); // ['one', 'two', 'three']
RegExp.prototype methods
test(str) returns a boolean; exec(str) returns the next match object (including capture groups) and advances lastIndex when the g or y flag is set. Prefer str.matchAll(re) over manual exec loops for cleaner iteration.
const re = /(\d+)/g;
const str = "foo123bar456";
// exec() — returns next match each call (stateful with g flag)
let match;
while ((match = re.exec(str)) !== null) {
console.log(`Found ${match[1]} at index ${match.index}`);
}
Output:
Found 123 at index 3
Found 456 at index 9
// test() — returns boolean
/^\d+$/.test("12345"); // true
/^\d+$/.test("123ab"); // false
Using
.exec()or.test()with a regex that has thegoryflag advanceslastIndex. If you reuse the same regex object, always resetre.lastIndex = 0between operations, or usestr.match(re)instead.
Named capture groups
Named groups ((?<name>...)) make complex patterns self-documenting and let you access captures by name via match.groups instead of by index. They also work in replace() substitution strings as $<name>.
const dateRe = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const match = "2026-04-26".match(dateRe);
const { year, month, day } = match.groups;
console.log(year, month, day); // 2026 04 26
// Named groups in replace()
"2026-04-26".replace(dateRe, "$<day>/$<month>/$<year>");
// "26/04/2026"
Output:
2026 04 26
.replace() with a function callback
When you pass a function as the second argument to .replace(), it is called for each match and its return value becomes the replacement string. The callback receives the full match, any capture group strings, the match offset, and the original string.
// Callback receives: (fullMatch, ...captureGroups, offset, originalStr)
"hello world".replace(/(\w+)/g, (match) => match.toUpperCase());
// "HELLO WORLD"
// Convert kebab-case to camelCase
"my-variable-name".replace(/-([a-z])/g, (_, char) => char.toUpperCase());
// "myVariableName"
// Pad all numbers to 3 digits
"item1 costs 5 dollars and item12 costs 99 dollars"
.replace(/\d+/g, (n) => n.padStart(3, "0"));
// "item001 costs 005 dollars and item012 costs 099 dollars"
Unicode support
The u flag enables full Unicode mode: multi-codepoint characters (emoji, supplementary scripts) are handled as single units, \u{HHHH} extended escapes work, and \p{Property} Unicode property escapes become available for matching categories like letters, digits, or scripts.
// \u{…} extended Unicode escapes — requires u or v flag
/\u{1F600}/u.test("😀"); // true
/\u{1F600}/.test("😀"); // false (u flag required)
// \p{…} Unicode property escapes — requires u or v flag
/\p{Letter}/u.test("é"); // true
/\p{Decimal_Number}/u.test("٣"); // true (Arabic digit)
/\p{Script=Greek}/u.test("α"); // true
// Without u flag, . does not match surrogate pairs correctly
"😀".match(/./); // matches only half the emoji (lone surrogate)
"😀".match(/./u); // matches the full emoji
d flag — match indices (ES2022)
const match = /(?<name>\w+)/.exec("hello world", "d");
// With d flag on the regex:
const re = /(?<name>\w+)/d;
const m = re.exec("hello world");
console.log(m.indices[0]); // [0, 5] — start/end of full match
console.log(m.indices.groups.name); // [0, 5] — start/end of named group
Common pattern templates
// Email (simplified — not RFC-complete)
const email = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
email.test("user@example.com"); // true
// URL (http/https)
const url = /^https?:\/\/[^\s/$.?#].[^\s]*$/i;
url.test("https://example.com/path?q=1"); // true
// IPv4 address
const ipv4 = /^(\d{1,3}\.){3}\d{1,3}$/;
ipv4.test("192.168.1.255"); // true
// Date YYYY-MM-DD
const isoDate = /^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$/;
isoDate.test("2026-04-26"); // true
// UUID v4
const uuid = /^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$/i;
uuid.test("550e8400-e29b-41d4-a716-446655440000"); // true
// URL slug (lowercase, hyphens, alphanumeric)
const slug = /^[a-z0-9]+(?:-[a-z0-9]+)*$/;
slug.test("my-article-title"); // true
slug.test("My Article Title"); // false
// Hex color (#rgb or #rrggbb)
const hexColor = /^#([0-9a-f]{3}|[0-9a-f]{6})$/i;
hexColor.test("#1a2b3c"); // true
hexColor.test("#abc"); // true
Sticky flag (y)
With the y flag, the regex must match at exactly lastIndex — it does not scan forward. Each successful match advances lastIndex to the end of that match, making it efficient for sequential tokeniser/lexer implementations that process a string left-to-right.
const re = /\d+/y;
re.lastIndex = 4;
re.exec("abc 123 456"); // ['123'] — matched at exactly position 4
re.lastIndex; // 7 — advanced past match
re.exec("abc 123 456"); // ['456'] — matched at position 7 (space then digit)
The y flag is useful for tokenizer/lexer implementations where you process a string sequentially and need to match at a specific position.
v flag — unicodeSets (ES2024)
The v flag is a superset of u. It enables set notation inside character classes — intersection (&&), subtraction (--), and string-literal alternatives (\q{abc}). It also disallows several quirks that u permitted, making patterns stricter but more predictable.
// Set difference — letters minus ASCII letters (i.e. non-ASCII letters)
/[\p{Letter}--[a-zA-Z]]/v.test("é"); // true
/[\p{Letter}--[a-zA-Z]]/v.test("a"); // false
// Set intersection — uppercase letters that are also ASCII
/[\p{Uppercase}&&[a-zA-Z]]/v.test("A"); // true
/[\p{Uppercase}&&[a-zA-Z]]/v.test("Α"); // false (Greek Alpha is uppercase but not ASCII)
// Strings inside a character class — match either of two multi-codepoint sequences
/^[\q{👨👩👧|🚀}]$/v.test("🚀"); // true
// Negated string-class
/[^\q{foo|bar}]/v.test("baz"); // true
/[^\q{foo|bar}]/v.test("foo"); // false
The v flag is the right default for new code — it unlocks set operations and tightens the grammar without losing anything u could do. It is mutually exclusive with u; specifying both throws SyntaxError.
RegExp lastIndex and statefulness
A RegExp with g or y is stateful: it carries a lastIndex cursor that advances on every successful match. Reusing the same object across operations without resetting lastIndex causes silent skipped matches and is one of the most common regex bugs.
const re = /foo/g;
re.test("foo foo"); // true, lastIndex now 3
re.test("foo foo"); // true, lastIndex now 7
re.test("foo foo"); // false — lastIndex (7) is past the end
re.test("foo foo"); // true again — false reset lastIndex to 0
Cures:
- Use a fresh literal at the call site:
/foo/g.test(...)allocates a new object each time. - Or use a stateless operation:
str.match(re),str.matchAll(re),str.replace(re, ...)all reset internally. - For sticky matching, explicitly assign
re.lastIndex = 0between unrelated operations.
// SAFE — string methods are not affected by lastIndex
const re = /\d+/g;
"a1 b2 c3".match(re); // ['1', '2', '3'] — no matter what re.lastIndex was before
String.prototype.matchAll vs RegExp.prototype.exec
matchAll(regex) returns an iterator of full Match objects (the same shape exec returns). It is the modern, stateless replacement for the historical while ((m = re.exec(str))) loop.
const text = "Alice: 30, Bob: 25, Carol: 35";
const re = /(?<name>\w+): (?<age>\d+)/g;
// Modern — single allocation, no shared state
for (const m of text.matchAll(re)) {
console.log(`${m.groups.name} is ${m.groups.age}`);
}
Output:
Alice is 30
Bob is 25
Carol is 35
matchAllrequires thegflag. Passing a non-global regex throwsTypeError: matchAll must be called with a global RegExp.
matchAll also exposes .indices when the regex has the d flag — start/end offsets for the full match and every named group.
const re = /(?<name>\w+):/gd;
for (const m of "alice: 30, bob: 25".matchAll(re)) {
console.log(m.indices.groups.name); // [start, end] tuple
}
Output:
[ 0, 5 ]
[ 11, 14 ]
.replace() with named-group substitution
When the replacement is a string, you can reference named groups with $<name> and numeric groups with $1–$9. $& is the full match, $\`` and $'` are the preceding and following text.
const re = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/g;
// Reorder named groups
"2026-05-25".replace(re, "$<day>/$<month>/$<year>");
// "25/05/2026"
// Substitution metasequences
"hello world".replace(/o/g, "[$&]");
// "hell[o] w[o]rld"
"abc-def".replace(/-/, "<$`|$'>");
// "abc<abc|def>def" ($` = preceding, $' = following)
Patterns and anti-patterns: catastrophic backtracking
ECMAScript regex uses a backtracking NFA engine, the same family used by Python re and PCRE. Patterns that allow the same character to be matched in multiple ways take exponential time on near-miss inputs. The classic shape is nested quantifiers with overlapping alternation.
// DANGEROUS — runs effectively forever on a non-match input
// /^(a+)+$/.test("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!")
// SAFE — one quantifier, no overlap
/^a+$/.test("aaaaaaaaaaaaaaaa"); // true
// SAFE — replace alternation with a negated class
/^[^!]+$/.test("aaaaaaaaaaaaaaaa"); // true
// SAFE — anchor with explicit non-overlapping classes
const ipPart = /(25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)/;
const ipv4 = new RegExp(`^${ipPart.source}(\\.${ipPart.source}){3}$`);
ipv4.test("192.168.1.255"); // true
ipv4.test("256.0.0.1"); // false
JavaScript does not support possessive quantifiers (a++) or atomic groups ((?>...)) — the two PCRE features that exist precisely to prevent backtracking. The mitigation in JS is to refactor the pattern so the engine has no choice to backtrack: use negated classes ([^>]) instead of .+?, anchor with ^/\b, and never nest quantifiers on the same character class.
Differences from Python re
For cross-language work — porting a regex from Python to JS or vice versa — these are the most common surprises. The full reference for Python is the re article; the PCRE comparison lives in linux/pcre.
| Feature | JavaScript | Python re |
|---|---|---|
| Named group syntax | (?<name>…) | (?P<name>…) |
| Named backreference (pattern) | \k<name> | (?P=name) |
| Named backreference (replacement) | $<name> | \g<name> |
Verbose / x flag (whitespace + comments) | not supported | re.X / re.VERBOSE |
| Variable-length lookbehind | supported (V8) | not supported (fixed only) |
| Atomic groups / possessive quantifiers | not supported | possessive since 3.11 |
Unicode property escapes \p{…} | requires u or v flag | always available (default Unicode) |
Default \w matches Unicode | only with u flag | yes |
| Set operations in character classes | v flag ([A--B], [A&&B]) | not supported |
Recursive patterns (?R) | not supported | not supported |
Sticky y flag | yes | not supported |
// Same date pattern in both languages — note the (?<…>) vs (?P<…>) difference
const jsDate = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
// Python: re.compile(r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})')
Real-world recipes
Stable line splitter (handles CR, LF, CRLF)
A robust line splitter needs to accept either Unix or Windows line endings without producing trailing empty strings.
const text = "a\r\nb\nc\rd\r\n";
const lines = text.split(/\r\n|\r|\n/);
console.log(lines);
Output:
[ 'a', 'b', 'c', 'd', '' ]
The trailing empty string comes from the final \r\n — drop it with a filter if undesired.
URL slugifier
Lowercase, strip diacritics, collapse non-alphanumerics to single hyphens, trim hyphens. The Unicode normalization step ensures café becomes cafe, not caf.
function slugify(title) {
return title
.normalize("NFD")
.replace(/[̀-ͯ]/g, "") // strip combining marks
.toLowerCase()
.replace(/[^a-z0-9]+/g, "-")
.replace(/^-+|-+$/g, "");
}
console.log(slugify("Hello, World! — JavaScript 2026"));
Output:
hello-world-javascript-2026
Replace emoji with shortcodes
String.prototype.replace with a callback receives the matched character; \p{Emoji} in unicode mode catches all of them. The map below is illustrative — production code would use a full table from an emoji library.
const SHORTCODES = { "🚀": ":rocket:", "🔥": ":fire:", "✨": ":sparkles:" };
const shortcoded = "ship it 🚀 🔥 ✨".replace(
/\p{Emoji_Presentation}/gu,
(e) => SHORTCODES[e] ?? e
);
console.log(shortcoded);
Output:
ship it :rocket: :fire: :sparkles:
Strip ANSI escape sequences
Common when sanitising captured CLI output before writing to a log file or rendering it in HTML.
const ANSI = /\x1b\[[0-9;]*m/g;
const colored = "\x1b[31mERROR\x1b[0m: \x1b[1mfile\x1b[0m missing";
console.log(colored.replace(ANSI, ""));
Output:
ERROR: file missing
Extract structured records with named groups
Parse an Apache-like log line into typed fields in a single pass.
const line = '192.168.1.5 - alice [25/May/2026:13:00:42 +0000] "GET /api HTTP/1.1" 200 1234';
const LOG = /^(?<ip>\S+) \S+ (?<user>\S+) \[(?<time>[^\]]+)\] "(?<method>\S+) (?<path>\S+) (?<proto>[^"]+)" (?<status>\d+) (?<bytes>\d+)$/;
const m = line.match(LOG);
console.log(m.groups);
Output:
{
ip: '192.168.1.5',
user: 'alice',
time: '25/May/2026:13:00:42 +0000',
method: 'GET',
path: '/api',
proto: 'HTTP/1.1',
status: '200',
bytes: '1234'
}
Tokenize source code with y flag
The sticky flag is ideal for hand-written lexers. Each pattern matches at exactly the current cursor; advancing lastIndex consumes the token.
const SPEC = [
["NUM", /\d+/y],
["IDENT", /[a-zA-Z_]\w*/y],
["OP", /[+\-*/=]/y],
["WS", /\s+/y],
];
function* tokenize(input) {
let i = 0;
while (i < input.length) {
let matched = false;
for (const [type, re] of SPEC) {
re.lastIndex = i;
const m = re.exec(input);
if (m && m.index === i) {
if (type !== "WS") yield { type, value: m[0], index: i };
i = re.lastIndex;
matched = true;
break;
}
}
if (!matched) throw new SyntaxError(`unexpected char at ${i}: ${input[i]}`);
}
}
console.log([...tokenize("x = 10 + 20")]);
Output:
[
{ type: 'IDENT', value: 'x', index: 0 },
{ type: 'OP', value: '=', index: 2 },
{ type: 'NUM', value: '10', index: 4 },
{ type: 'OP', value: '+', index: 7 },
{ type: 'NUM', value: '20', index: 9 }
]
Escape user input for inclusion in a regex
Whenever a user-supplied string ends up inside a new RegExp(...), escape it. The function below covers every metacharacter the ECMA spec defines.
function escapeRegExp(s) {
return s.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
}
const needle = "C++ (cpp)";
const haystack = "I love C++ (cpp) and JavaScript";
const re = new RegExp(escapeRegExp(needle), "g");
console.log(haystack.replace(re, "[REDACTED]"));
Output:
I love [REDACTED] and JavaScript
Strip HTML tags (lightweight)
Quick-and-dirty plain-text extraction. For untrusted HTML, use a real parser (DOMParser in the browser, parse5 or cheerio in Node) — this regex does not handle nested CDATA or arbitrary attribute payloads safely.
const html = "<p>Hello <strong>world</strong>!</p>";
console.log(html.replace(/<[^>]+>/g, ""));
Output:
Hello world!
Validate semantic version strings
A near-spec-compliant SemVer 2 pattern, broken into named groups for downstream parsing.
const SEMVER = /^v?(?<major>0|[1-9]\d*)\.(?<minor>0|[1-9]\d*)\.(?<patch>0|[1-9]\d*)(?:-(?<pre>[\da-zA-Z-]+(?:\.[\da-zA-Z-]+)*))?(?:\+(?<build>[\da-zA-Z-]+(?:\.[\da-zA-Z-]+)*))?$/;
for (const v of ["1.2.3", "v1.2.3-beta.4+exp", "1.02.3"]) {
const m = v.match(SEMVER);
console.log(v, "→", m ? m.groups : null);
}
Output:
1.2.3 → { major: '1', minor: '2', patch: '3', pre: undefined, build: undefined }
v1.2.3-beta.4+exp → { major: '1', minor: '2', patch: '3', pre: 'beta.4', build: 'exp' }
1.02.3 → null
Common pitfalls
- Reusing a
goryregex object without resettinglastIndex— half your matches silently disappear. Either re-create the regex or use a stateless string method. new RegExpfrom user input without escaping — turns a literal search box into a denial-of-service vector via catastrophic patterns. Always run input through anescapeRegExphelper.- Capture groups when you only need grouping — slow and pollutes
.groups/ numbered references. Use(?:…)unless you actually need the capture. - Greedy quantifiers on
<tag>-like patterns —<.+>on<a><b>matches the whole thing. Use a negated class<[^>]+>for safety and speed. /\d/withoutuon Arabic / Bengali digits —\dmatches only ASCII[0-9]by default. With theuflag,\p{Decimal_Number}matches every Unicode decimal digit.- Forgetting
gonreplaceAll-with-regex —"aaa".replaceAll(/a/, "b")throwsTypeError: must be global. Use/a/gor the string form. .match()with agregex throws away capture groups — it returns just the full matches. Use.matchAll()or.exec()to keep groups.- Escaping in template strings —
new RegExp(\\d+`)is\d+because backslashes are eaten by the string literal. Either useString.raw\\d+` or double-escape. - Variable-length lookbehind portability — V8 (Chrome, Node) supports it; some older engines don't. Test target runtimes if you ship to legacy browsers.
lastIndexon a literal in a hot loop —/foo/ginside a function body allocates a new RegExp on every call. Hoist to module scope when patterns are constants.