cheat sheet

Regular Expressions

JavaScript has built-in RegExp support (ES2018+ with named groups, lookbehind, dotAll). Covers literal syntax, flags, character classes, methods, named captures, and common patterns.

#javascript#regex#languageupdated 04-26-2026

Regular Expressions

What it is

JavaScript has built-in Regular Expression support based on a PCRE-like syntax. ES2018 added named capture groups, lookbehind assertions, and the s (dotAll) flag. ES2022 added the d (indices) flag. ES2024 added the v (unicodeSets) flag. RegExp literals are compiled at parse time; new RegExp() is evaluated at runtime (useful for dynamic patterns).

Literal syntax vs RegExp constructor

javascript
// Literal — compiled at parse time; use for static patterns
const re = /hello/i;

// Constructor — evaluated at runtime; use for dynamic patterns
const word = "hello";
const re2 = new RegExp(word, "i");

// Escape special characters when building from user input
function escapeRegExp(str) {
  return str.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
}
const userInput = "file.txt";
const safe = new RegExp(escapeRegExp(userInput), "g");

Flags

FlagNameEffect
gglobalFind all matches (not just first); advances lastIndex
iignoreCaseCase-insensitive matching
mmultiline^ and $ match start/end of each line, not the whole string
sdotAll. matches newline characters too (ES2018)
uunicodeFull Unicode mode; enables \u{…} escapes and \p{…} properties
vunicodeSetsSuperset of u; enables set operations [A--B], [A&&B] (ES2024)
dhasIndicesAdds .indices array to match results with start/end positions (ES2022)
ystickyMatch only at lastIndex position; does not advance past non-matches
javascript
const str = "Hello\nWorld";
/^world/i.test(str);   // false — ^ matches start of string only
/^world/im.test(str);  // true  — m flag makes ^ match start of line
/hello.world/s.test(str); // true — s flag allows . to match \n

Character classes and syntax

Character classes define sets of characters that a position may match. Square brackets [...] match any one character in the set; shorthand escapes like \d, \w, and \s expand to common sets; anchors and quantifiers control position and repetition.

javascript
// Character classes
/[aeiou]/     // any vowel
/[^aeiou]/    // any non-vowel
/[a-z]/       // a through z
/[a-zA-Z0-9]/ // alphanumeric

// Shorthand classes
/\d/   // digit: [0-9]
/\D/   // non-digit
/\w/   // word char: [a-zA-Z0-9_]
/\W/   // non-word char
/\s/   // whitespace (space, tab, newline, etc.)
/\S/   // non-whitespace
/./    // any char except newline (unless s flag)

// Anchors
/^start/   // start of string (or line with m flag)
/end$/     // end of string (or line with m flag)
/\bword\b/ // word boundary
/\Bword\B/ // non-word boundary

// Quantifiers
/a*/    // 0 or more
/a+/    // 1 or more
/a?/    // 0 or 1
/a{3}/  // exactly 3
/a{2,5}/  // 2 to 5
/a{2,}/   // 2 or more

// Quantifier greediness
/a+/    // greedy: matches as many as possible
/a+?/   // lazy: matches as few as possible

Groups

Parentheses group sub-expressions and, by default, capture the matched text as a numbered group. Use (?:...) to group without capturing (cheaper, no slot allocated); use (?<name>...) for named captures accessible via match.groups.

javascript
// Capturing group — captured in match result
/(foo)(bar)/.exec("foobar");
// ['foobar', 'foo', 'bar', index: 0, ...]

// Non-capturing group — groups without capture
/(?:foo)(bar)/.exec("foobar");
// ['foobar', 'bar', index: 0, ...]

// Named capturing group (ES2018)
/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/.exec("2026-04-26");
// match.groups = { year: '2026', month: '04', day: '26' }

// Backreferences to captured group
/(['"]).*?\1/.test('"quoted"');  // true — \1 refers to group 1
/(?<q>['"]).*?\k<q>/.test('"quoted"'); // named backreference

Lookahead and lookbehind

Zero-width assertions that match a position based on what precedes or follows it, without consuming characters. Lookaheads ((?=...) / (?!...)) are part of ES5; lookbehinds ((?<=...) / (?<!...)) require ES2018 or later.

javascript
// Positive lookahead — match X only if followed by Y
/\d+(?= dollars)/.exec("100 dollars"); // ['100']

// Negative lookahead — match X only if NOT followed by Y
/\d+(?! dollars)/.exec("100 euros");  // ['100']

// Positive lookbehind (ES2018) — match X only if preceded by Y
/(?<=\$)\d+/.exec("$42");   // ['42']

// Negative lookbehind (ES2018) — match X only if NOT preceded by Y
/(?<!\$)\d+/.exec("42 USD"); // ['42']

String methods with RegExp

Strings expose match, matchAll, replace, replaceAll, search, and split, all of which accept a RegExp. match / matchAll extract matches; replace / replaceAll substitute them; search returns the match index; split uses the pattern as a delimiter.

javascript
const str = "The quick brown fox";

// test() — boolean check
/quick/.test(str);   // true

// match() — returns first match (no g flag) or all matches (g flag)
str.match(/\w+/);    // ['The', index: 0, input: '...', groups: undefined]
str.match(/\w+/g);   // ['The', 'quick', 'brown', 'fox']

// matchAll() — returns iterator of all matches WITH groups (requires g flag)
const re = /(?<word>\w+)/g;
for (const match of str.matchAll(re)) {
  console.log(match.groups.word, "at", match.index);
}

// search() — returns index of first match, or -1
str.search(/fox/);   // 16

// replace() — replace first match (no g) or all (g flag)
str.replace(/\b\w/, (c) => c.toUpperCase()); // Already capitalized...
"foo foo foo".replace(/foo/, "bar");    // "bar foo foo"
"foo foo foo".replace(/foo/g, "bar");   // "bar bar bar"

// replaceAll() — always replaces all (string or regex with g flag)
"foo foo foo".replaceAll("foo", "bar"); // "bar bar bar"

// split() — split on pattern
"one1two2three".split(/\d/); // ['one', 'two', 'three']

RegExp.prototype methods

test(str) returns a boolean; exec(str) returns the next match object (including capture groups) and advances lastIndex when the g or y flag is set. Prefer str.matchAll(re) over manual exec loops for cleaner iteration.

javascript
const re = /(\d+)/g;
const str = "foo123bar456";

// exec() — returns next match each call (stateful with g flag)
let match;
while ((match = re.exec(str)) !== null) {
  console.log(`Found ${match[1]} at index ${match.index}`);
}

Output:

text
Found 123 at index 3
Found 456 at index 9
javascript
// test() — returns boolean
/^\d+$/.test("12345");  // true
/^\d+$/.test("123ab");  // false

Using .exec() or .test() with a regex that has the g or y flag advances lastIndex. If you reuse the same regex object, always reset re.lastIndex = 0 between operations, or use str.match(re) instead.

Named capture groups

Named groups ((?<name>...)) make complex patterns self-documenting and let you access captures by name via match.groups instead of by index. They also work in replace() substitution strings as $<name>.

javascript
const dateRe = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const match = "2026-04-26".match(dateRe);

const { year, month, day } = match.groups;
console.log(year, month, day);  // 2026 04 26

// Named groups in replace()
"2026-04-26".replace(dateRe, "$<day>/$<month>/$<year>");
// "26/04/2026"

Output:

text
2026 04 26

.replace() with a function callback

When you pass a function as the second argument to .replace(), it is called for each match and its return value becomes the replacement string. The callback receives the full match, any capture group strings, the match offset, and the original string.

javascript
// Callback receives: (fullMatch, ...captureGroups, offset, originalStr)
"hello world".replace(/(\w+)/g, (match) => match.toUpperCase());
// "HELLO WORLD"

// Convert kebab-case to camelCase
"my-variable-name".replace(/-([a-z])/g, (_, char) => char.toUpperCase());
// "myVariableName"

// Pad all numbers to 3 digits
"item1 costs 5 dollars and item12 costs 99 dollars"
  .replace(/\d+/g, (n) => n.padStart(3, "0"));
// "item001 costs 005 dollars and item012 costs 099 dollars"

Unicode support

The u flag enables full Unicode mode: multi-codepoint characters (emoji, supplementary scripts) are handled as single units, \u{HHHH} extended escapes work, and \p{Property} Unicode property escapes become available for matching categories like letters, digits, or scripts.

javascript
// \u{…} extended Unicode escapes — requires u or v flag
/\u{1F600}/u.test("😀");  // true
/\u{1F600}/.test("😀");   // false (u flag required)

// \p{…} Unicode property escapes — requires u or v flag
/\p{Letter}/u.test("é");        // true
/\p{Decimal_Number}/u.test("٣"); // true (Arabic digit)
/\p{Script=Greek}/u.test("α");  // true

// Without u flag, . does not match surrogate pairs correctly
"😀".match(/./);   // matches only half the emoji (lone surrogate)
"😀".match(/./u);  // matches the full emoji

d flag — match indices (ES2022)

javascript
const match = /(?<name>\w+)/.exec("hello world", "d");
// With d flag on the regex:
const re = /(?<name>\w+)/d;
const m = re.exec("hello world");
console.log(m.indices[0]);        // [0, 5] — start/end of full match
console.log(m.indices.groups.name); // [0, 5] — start/end of named group

Common pattern templates

javascript
// Email (simplified — not RFC-complete)
const email = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
email.test("user@example.com"); // true

// URL (http/https)
const url = /^https?:\/\/[^\s/$.?#].[^\s]*$/i;
url.test("https://example.com/path?q=1"); // true

// IPv4 address
const ipv4 = /^(\d{1,3}\.){3}\d{1,3}$/;
ipv4.test("192.168.1.255"); // true

// Date YYYY-MM-DD
const isoDate = /^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$/;
isoDate.test("2026-04-26"); // true

// UUID v4
const uuid = /^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$/i;
uuid.test("550e8400-e29b-41d4-a716-446655440000"); // true

// URL slug (lowercase, hyphens, alphanumeric)
const slug = /^[a-z0-9]+(?:-[a-z0-9]+)*$/;
slug.test("my-article-title"); // true
slug.test("My Article Title"); // false

// Hex color (#rgb or #rrggbb)
const hexColor = /^#([0-9a-f]{3}|[0-9a-f]{6})$/i;
hexColor.test("#1a2b3c"); // true
hexColor.test("#abc");    // true

Sticky flag (y)

With the y flag, the regex must match at exactly lastIndex — it does not scan forward. Each successful match advances lastIndex to the end of that match, making it efficient for sequential tokeniser/lexer implementations that process a string left-to-right.

javascript
const re = /\d+/y;
re.lastIndex = 4;
re.exec("abc 123 456"); // ['123'] — matched at exactly position 4
re.lastIndex;           // 7 — advanced past match
re.exec("abc 123 456"); // ['456'] — matched at position 7 (space then digit)

The y flag is useful for tokenizer/lexer implementations where you process a string sequentially and need to match at a specific position.

v flag — unicodeSets (ES2024)

The v flag is a superset of u. It enables set notation inside character classes — intersection (&&), subtraction (--), and string-literal alternatives (\q{abc}). It also disallows several quirks that u permitted, making patterns stricter but more predictable.

javascript
// Set difference — letters minus ASCII letters (i.e. non-ASCII letters)
/[\p{Letter}--[a-zA-Z]]/v.test("é");     // true
/[\p{Letter}--[a-zA-Z]]/v.test("a");     // false

// Set intersection — uppercase letters that are also ASCII
/[\p{Uppercase}&&[a-zA-Z]]/v.test("A");  // true
/[\p{Uppercase}&&[a-zA-Z]]/v.test("Α");  // false (Greek Alpha is uppercase but not ASCII)

// Strings inside a character class — match either of two multi-codepoint sequences
/^[\q{👨‍👩‍👧|🚀}]$/v.test("🚀");           // true

// Negated string-class
/[^\q{foo|bar}]/v.test("baz");           // true
/[^\q{foo|bar}]/v.test("foo");           // false

The v flag is the right default for new code — it unlocks set operations and tightens the grammar without losing anything u could do. It is mutually exclusive with u; specifying both throws SyntaxError.

RegExp lastIndex and statefulness

A RegExp with g or y is stateful: it carries a lastIndex cursor that advances on every successful match. Reusing the same object across operations without resetting lastIndex causes silent skipped matches and is one of the most common regex bugs.

javascript
const re = /foo/g;
re.test("foo foo");   // true, lastIndex now 3
re.test("foo foo");   // true, lastIndex now 7
re.test("foo foo");   // false — lastIndex (7) is past the end
re.test("foo foo");   // true again — false reset lastIndex to 0

Cures:

  • Use a fresh literal at the call site: /foo/g.test(...) allocates a new object each time.
  • Or use a stateless operation: str.match(re), str.matchAll(re), str.replace(re, ...) all reset internally.
  • For sticky matching, explicitly assign re.lastIndex = 0 between unrelated operations.
javascript
// SAFE — string methods are not affected by lastIndex
const re = /\d+/g;
"a1 b2 c3".match(re);  // ['1', '2', '3'] — no matter what re.lastIndex was before

String.prototype.matchAll vs RegExp.prototype.exec

matchAll(regex) returns an iterator of full Match objects (the same shape exec returns). It is the modern, stateless replacement for the historical while ((m = re.exec(str))) loop.

javascript
const text = "Alice: 30, Bob: 25, Carol: 35";
const re = /(?<name>\w+): (?<age>\d+)/g;

// Modern — single allocation, no shared state
for (const m of text.matchAll(re)) {
  console.log(`${m.groups.name} is ${m.groups.age}`);
}

Output:

text
Alice is 30
Bob is 25
Carol is 35

matchAll requires the g flag. Passing a non-global regex throws TypeError: matchAll must be called with a global RegExp.

matchAll also exposes .indices when the regex has the d flag — start/end offsets for the full match and every named group.

javascript
const re = /(?<name>\w+):/gd;
for (const m of "alice: 30, bob: 25".matchAll(re)) {
  console.log(m.indices.groups.name);   // [start, end] tuple
}

Output:

text
[ 0, 5 ]
[ 11, 14 ]

.replace() with named-group substitution

When the replacement is a string, you can reference named groups with $<name> and numeric groups with $1$9. $& is the full match, $\`` and $'` are the preceding and following text.

javascript
const re = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/g;

// Reorder named groups
"2026-05-25".replace(re, "$<day>/$<month>/$<year>");
// "25/05/2026"

// Substitution metasequences
"hello world".replace(/o/g, "[$&]");
// "hell[o] w[o]rld"

"abc-def".replace(/-/, "<$`|$'>");
// "abc<abc|def>def"  ($` = preceding, $' = following)

Patterns and anti-patterns: catastrophic backtracking

ECMAScript regex uses a backtracking NFA engine, the same family used by Python re and PCRE. Patterns that allow the same character to be matched in multiple ways take exponential time on near-miss inputs. The classic shape is nested quantifiers with overlapping alternation.

javascript
// DANGEROUS — runs effectively forever on a non-match input
// /^(a+)+$/.test("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!")

// SAFE — one quantifier, no overlap
/^a+$/.test("aaaaaaaaaaaaaaaa");   // true

// SAFE — replace alternation with a negated class
/^[^!]+$/.test("aaaaaaaaaaaaaaaa"); // true

// SAFE — anchor with explicit non-overlapping classes
const ipPart = /(25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)/;
const ipv4 = new RegExp(`^${ipPart.source}(\\.${ipPart.source}){3}$`);
ipv4.test("192.168.1.255");  // true
ipv4.test("256.0.0.1");      // false

JavaScript does not support possessive quantifiers (a++) or atomic groups ((?>...)) — the two PCRE features that exist precisely to prevent backtracking. The mitigation in JS is to refactor the pattern so the engine has no choice to backtrack: use negated classes ([^>]) instead of .+?, anchor with ^/\b, and never nest quantifiers on the same character class.

Differences from Python re

For cross-language work — porting a regex from Python to JS or vice versa — these are the most common surprises. The full reference for Python is the re article; the PCRE comparison lives in linux/pcre.

FeatureJavaScriptPython re
Named group syntax(?<name>…)(?P<name>…)
Named backreference (pattern)\k<name>(?P=name)
Named backreference (replacement)$<name>\g<name>
Verbose / x flag (whitespace + comments)not supportedre.X / re.VERBOSE
Variable-length lookbehindsupported (V8)not supported (fixed only)
Atomic groups / possessive quantifiersnot supportedpossessive since 3.11
Unicode property escapes \p{…}requires u or v flagalways available (default Unicode)
Default \w matches Unicodeonly with u flagyes
Set operations in character classesv flag ([A--B], [A&&B])not supported
Recursive patterns (?R)not supportednot supported
Sticky y flagyesnot supported
javascript
// Same date pattern in both languages — note the (?<…>) vs (?P<…>) difference
const jsDate = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
// Python:    re.compile(r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})')

Real-world recipes

Stable line splitter (handles CR, LF, CRLF)

A robust line splitter needs to accept either Unix or Windows line endings without producing trailing empty strings.

javascript
const text = "a\r\nb\nc\rd\r\n";
const lines = text.split(/\r\n|\r|\n/);
console.log(lines);

Output:

text
[ 'a', 'b', 'c', 'd', '' ]

The trailing empty string comes from the final \r\n — drop it with a filter if undesired.

URL slugifier

Lowercase, strip diacritics, collapse non-alphanumerics to single hyphens, trim hyphens. The Unicode normalization step ensures café becomes cafe, not caf.

javascript
function slugify(title) {
  return title
    .normalize("NFD")
    .replace(/[̀-ͯ]/g, "")    // strip combining marks
    .toLowerCase()
    .replace(/[^a-z0-9]+/g, "-")
    .replace(/^-+|-+$/g, "");
}

console.log(slugify("Hello, World! — JavaScript 2026"));

Output:

text
hello-world-javascript-2026

Replace emoji with shortcodes

String.prototype.replace with a callback receives the matched character; \p{Emoji} in unicode mode catches all of them. The map below is illustrative — production code would use a full table from an emoji library.

javascript
const SHORTCODES = { "🚀": ":rocket:", "🔥": ":fire:", "✨": ":sparkles:" };

const shortcoded = "ship it 🚀 🔥 ✨".replace(
  /\p{Emoji_Presentation}/gu,
  (e) => SHORTCODES[e] ?? e
);

console.log(shortcoded);

Output:

text
ship it :rocket: :fire: :sparkles:

Strip ANSI escape sequences

Common when sanitising captured CLI output before writing to a log file or rendering it in HTML.

javascript
const ANSI = /\x1b\[[0-9;]*m/g;
const colored = "\x1b[31mERROR\x1b[0m: \x1b[1mfile\x1b[0m missing";
console.log(colored.replace(ANSI, ""));

Output:

text
ERROR: file missing

Extract structured records with named groups

Parse an Apache-like log line into typed fields in a single pass.

javascript
const line = '192.168.1.5 - alice [25/May/2026:13:00:42 +0000] "GET /api HTTP/1.1" 200 1234';

const LOG = /^(?<ip>\S+) \S+ (?<user>\S+) \[(?<time>[^\]]+)\] "(?<method>\S+) (?<path>\S+) (?<proto>[^"]+)" (?<status>\d+) (?<bytes>\d+)$/;

const m = line.match(LOG);
console.log(m.groups);

Output:

text
{
  ip: '192.168.1.5',
  user: 'alice',
  time: '25/May/2026:13:00:42 +0000',
  method: 'GET',
  path: '/api',
  proto: 'HTTP/1.1',
  status: '200',
  bytes: '1234'
}

Tokenize source code with y flag

The sticky flag is ideal for hand-written lexers. Each pattern matches at exactly the current cursor; advancing lastIndex consumes the token.

javascript
const SPEC = [
  ["NUM",   /\d+/y],
  ["IDENT", /[a-zA-Z_]\w*/y],
  ["OP",    /[+\-*/=]/y],
  ["WS",    /\s+/y],
];

function* tokenize(input) {
  let i = 0;
  while (i < input.length) {
    let matched = false;
    for (const [type, re] of SPEC) {
      re.lastIndex = i;
      const m = re.exec(input);
      if (m && m.index === i) {
        if (type !== "WS") yield { type, value: m[0], index: i };
        i = re.lastIndex;
        matched = true;
        break;
      }
    }
    if (!matched) throw new SyntaxError(`unexpected char at ${i}: ${input[i]}`);
  }
}

console.log([...tokenize("x = 10 + 20")]);

Output:

text
[
  { type: 'IDENT', value: 'x', index: 0 },
  { type: 'OP', value: '=', index: 2 },
  { type: 'NUM', value: '10', index: 4 },
  { type: 'OP', value: '+', index: 7 },
  { type: 'NUM', value: '20', index: 9 }
]

Escape user input for inclusion in a regex

Whenever a user-supplied string ends up inside a new RegExp(...), escape it. The function below covers every metacharacter the ECMA spec defines.

javascript
function escapeRegExp(s) {
  return s.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
}

const needle = "C++ (cpp)";
const haystack = "I love C++ (cpp) and JavaScript";
const re = new RegExp(escapeRegExp(needle), "g");
console.log(haystack.replace(re, "[REDACTED]"));

Output:

text
I love [REDACTED] and JavaScript

Strip HTML tags (lightweight)

Quick-and-dirty plain-text extraction. For untrusted HTML, use a real parser (DOMParser in the browser, parse5 or cheerio in Node) — this regex does not handle nested CDATA or arbitrary attribute payloads safely.

javascript
const html = "<p>Hello <strong>world</strong>!</p>";
console.log(html.replace(/<[^>]+>/g, ""));

Output:

text
Hello world!

Validate semantic version strings

A near-spec-compliant SemVer 2 pattern, broken into named groups for downstream parsing.

javascript
const SEMVER = /^v?(?<major>0|[1-9]\d*)\.(?<minor>0|[1-9]\d*)\.(?<patch>0|[1-9]\d*)(?:-(?<pre>[\da-zA-Z-]+(?:\.[\da-zA-Z-]+)*))?(?:\+(?<build>[\da-zA-Z-]+(?:\.[\da-zA-Z-]+)*))?$/;

for (const v of ["1.2.3", "v1.2.3-beta.4+exp", "1.02.3"]) {
  const m = v.match(SEMVER);
  console.log(v, "→", m ? m.groups : null);
}

Output:

text
1.2.3 → { major: '1', minor: '2', patch: '3', pre: undefined, build: undefined }
v1.2.3-beta.4+exp → { major: '1', minor: '2', patch: '3', pre: 'beta.4', build: 'exp' }
1.02.3 → null

Common pitfalls

  1. Reusing a g or y regex object without resetting lastIndex — half your matches silently disappear. Either re-create the regex or use a stateless string method.
  2. new RegExp from user input without escaping — turns a literal search box into a denial-of-service vector via catastrophic patterns. Always run input through an escapeRegExp helper.
  3. Capture groups when you only need grouping — slow and pollutes .groups / numbered references. Use (?:…) unless you actually need the capture.
  4. Greedy quantifiers on <tag>-like patterns<.+> on <a><b> matches the whole thing. Use a negated class <[^>]+> for safety and speed.
  5. /\d/ without u on Arabic / Bengali digits\d matches only ASCII [0-9] by default. With the u flag, \p{Decimal_Number} matches every Unicode decimal digit.
  6. Forgetting g on replaceAll-with-regex"aaa".replaceAll(/a/, "b") throws TypeError: must be global. Use /a/g or the string form.
  7. .match() with a g regex throws away capture groups — it returns just the full matches. Use .matchAll() or .exec() to keep groups.
  8. Escaping in template stringsnew RegExp(\\d+`)is\d+because backslashes are eaten by the string literal. Either useString.raw\\d+` or double-escape.
  9. Variable-length lookbehind portability — V8 (Chrome, Node) supports it; some older engines don't. Test target runtimes if you ship to legacy browsers.
  10. lastIndex on a literal in a hot loop/foo/g inside a function body allocates a new RegExp on every call. Hoist to module scope when patterns are constants.