concept · weight 10

JSON

Text-based data interchange format derived from JavaScript object literals and standardised as RFC 8259 / ECMA-404.

JSON

Definition

JSON (JavaScript Object Notation) is a text-based, language-independent data interchange format that represents structured values as nested objects, arrays, strings, numbers, booleans, and null. It is jointly standardised as IETF RFC 8259 and ECMA-404, and it is the default wire format for almost every modern HTTP API, configuration file, and inter-process message that needs to cross a language boundary. Reach for JSON whenever two systems written in different languages need to exchange structured data and you do not have a hard performance requirement that forces a binary format like Protocol Buffers, MessagePack, or CBOR.

Why it matters

JSON's success comes from three properties no earlier format had at once: it is human-readable (you can curl an API and read the response in a terminal), it maps cleanly to the native data structures of most dynamic languages (a JSON object becomes a Python dict, a JS object, a Ruby Hash with no schema gymnastics), and the grammar is small enough that a competent engineer can write a parser in an afternoon. That trio is why every REST and most RPC APIs default to JSON, every npm and Cargo manifest is JSON-shaped, every Claude Code, VS Code, and tsconfig.json configuration file is JSON or JSON-with-comments, and every modern LLM that returns "structured output" returns JSON. If you cannot read, write, query, and validate JSON fluently, large parts of the modern software stack are opaque to you.

The cost of that ubiquity is sloppiness. JSON has no native date, decimal, or binary type, the spec permits duplicate object keys with undefined resolution, and IEEE-754 doubles silently corrupt integers above 2^53. Production systems that treat JSON as "just text" eventually lose money to a 64-bit ID truncation, a duplicate-key override, or a NaN that became null. Knowing the format's hard edges is the difference between using JSON and being bitten by it.

How it works

A JSON document is a single value, which is one of six types:

  • object — unordered collection of "string": value pairs delimited by { }.
  • array — ordered list of values delimited by [ ].
  • string — Unicode text in double quotes, with \ escapes for control characters, quote, backslash, and \uXXXX BMP code points (surrogate pairs for higher planes).
  • number — decimal with optional fraction and exponent. The grammar does not distinguish integer from float, has no upper bound, and forbids NaN, +Infinity, -Infinity, and leading +.
  • booleantrue or false (lowercase, unquoted).
  • null — the literal null.
json
{
  "name": "Alice Dev",
  "email": "alice@example.com",
  "roles": ["admin", "viewer"],
  "active": true,
  "loginCount": 42,
  "lastLogin": null
}

Encoding must be UTF-8 for any document exchanged across a trust boundary — RFC 8259 explicitly removed the UTF-16/UTF-32 BOM allowance from earlier revisions. Parsers walk the input left-to-right, build the corresponding in-memory tree, and surface that tree to the caller as whatever the host language's idiomatic mapping is (dict/list in Python, Object/Array in JS, map[string]any/[]any in Go, serde_json::Value in Rust). The mapping is lossy in both directions — JSON has no native date, decimal, set, or binary type, so the encoder/decoder has to layer those on top with conventions (ISO-8601 strings for dates, base64 strings for bytes, arrays for sets).

Three families of supersets show up in real codebases and need to be distinguished:

  • JSON (RFC 8259 / ECMA-404) — the strict spec. No comments, no trailing commas, double-quoted keys only.
  • JSONC — JSON with // and /* */ comments. Used by tsconfig.json, VS Code settings.json, Claude Code settings.json. The reference jsonc-parser rejects trailing commas by default.
  • JSON5 — a larger ECMAScript-5-inspired superset that additionally allows trailing commas, single-quoted strings, unquoted keys, hex numbers, and NaN/Infinity. Used by .babelrc and some build configs.

For schemas, the de facto standard is JSON Schema 2020-12 — a vocabulary expressed in JSON for validating other JSON documents (type, required, enum, pattern, prefixItems, $ref, $dynamicRef). It is what tool-use APIs like Claude's input_schema, OpenAI's response_format, and OpenAPI 3.1 all consume.

For streaming, the dominant format is JSON Lines (a.k.a. JSONL or NDJSON): one JSON value per line, separated by \n, UTF-8 encoded, file extension .jsonl. Each line is independently parseable, so a 50 GB log can be processed by a generator in constant memory — the shape jq, ndjson, OpenAI fine-tune files, and most log shippers consume.

The query-language ecosystem

The reference query language for JSON in shell pipelines is jq, and as of jq 1.7 (Sep 2023) and jq 1.8 (June 2024) the surface area has caught up with the work people had been hand-rolling for years. The four ergonomics that matter most when reading other people's JSON-wrangling scripts:

  • pick(path_expr) — typed inverse of del; projects an object down to a named set of paths, returning null for missing leaves (jq 'pick(.id, .user.name, .ts)').
  • INDEX(stream; key_expr) — turn a stream of objects into a lookup table keyed by key_expr; pairs with JOIN for SQL-style joins across two JSON files.
  • IN(stream) — boolean test for "is this value in that stream", finally giving allowlist filtering a readable form (.[] | select(.status | IN("error","warn"))).
  • trim/ltrim/rtrim, abs, toboolean, --raw-output0 — small additions that remove the most common reasons people previously dropped down into gsub, fabs, custom string parsing, or unsafe newline-separated xargs pipelines.

Around jq is a growing ecosystem of compatible and adjacent tools. Pick by data format and workflow — none of these is a strict drop-in:

  • jaq (Rust) — jq-language-compatible reimplementation that is several times faster to start and execute. Worth it for CI pipelines and scripts that invoke jq in a loop.
  • gojq (Go) — pure-Go jq with YAML I/O, better error messages, and a single static binary that drops cleanly into minimal container images.
  • yq (Mike Farah, Go) — YAML-first processor with jq-like syntax; reads and writes JSON, XML, CSV, TOML, properties. The de facto tool for editing Kubernetes manifests, Helm values, and GitHub Actions workflows.
  • dasel (Go) — single query language across JSON, YAML, TOML, XML, CSV; useful for ETL where input and output formats differ.
  • fq (Go) — the jq language applied to binary formats (mp4, pcap, ELF, PNG, ZIP); read structured data out of binaries with familiar filters.
  • gron / fastgron — flatten JSON into greppable path = value; lines (and back). Discover the path you need with curl ... | gron | grep <kw>, then write the jq filter once.

For interactive exploration before you commit to a filter, jless (Rust, curses-style collapsible pager) and fx (Go, interactive viewer plus full-script transforms) cover the "I just want to see the shape" case that jq itself is bad at.

A second query language worth knowing about is JMESPath — the format embedded into the two largest cloud CLIs (aws --query and az --query). JMESPath runs client-side against the response of a single API call (so it doesn't reduce server-side load), and its grammar is deliberately smaller than jq's: projections ([].name), filter expressions ([?env=='prod']), multi-select hashes ({name: name, size: size}), and pipes (| [0]). It is the right tool when you're already shelling out to a cloud CLI and want to avoid an extra jq invocation; jq remains the right tool when you have a JSON file on disk or are composing multi-stage transforms. az defaults its output to JSON specifically so JMESPath, jq, or any downstream JSON parser can pick it up without an --output flag.

Common pitfalls

  1. Integers larger than 2^53 lose precision. JS JSON.parse puts every number into a 64-bit float, so a Twitter/Snowflake-style 64-bit ID 9007199254740993 round-trips as 9007199254740992. Fix: emit large integers as strings on the server ({"id": "9007199254740993"}) and parse them with a big-int library on the client.
  2. NaN and Infinity are not JSON. JSON.stringify(NaN) returns the literal string "null" in JavaScript, and Python's json.dumps emits the bare tokens NaN / Infinity by default — which any strict parser rejects. Fix: pass allow_nan=False in Python and validate floats before serialising, or use a domain-specific sentinel like null.
  3. Duplicate keys are undefined behaviour. The grammar permits {"a": 1, "a": 2}; most parsers silently keep the last value, but the spec does not require it. Treat duplicates as bugs and reject them with a strict validator.
  4. No native date or decimal type. Always serialise dates as ISO-8601 strings ("2026-05-25T19:21:31Z") and monetary values as either strings or scaled integers (cents, not dollars). Round-tripping a JS Date or a Python Decimal through generic JSON loses precision and timezone information.
  5. Trailing commas and comments break strict parsers. {"a": 1,} and // note are valid JSON5 / JSONC but illegal RFC 8259. Use a JSONC- or JSON5-aware parser for config files, and a strict one for wire payloads — do not mix.
  6. String escapes must be \uXXXX, not \xXX or \0. JSON inherits ECMAScript string escapes but only the JSON subset. Lone surrogates (\uD800 without a paired low surrogate) are technically allowed by the grammar but rejected by most modern parsers and by the WHATWG TextEncoder.
  7. UTF-8 only across a trust boundary. RFC 8259 makes UTF-8 mandatory for interoperability. Do not send UTF-16 or include a BOM; many parsers treat the BOM as a syntax error on the first byte.
  8. Pretty-printed JSON is not free. JSON.stringify(x, null, 2) doubles or triples payload size. Pretty-print for humans (logs, dev consoles) and compact for the wire.

Where to go next

Concrete cheat sheets for the tools and APIs that read, write, query, or validate JSON across the rest of the site:

  • /sections/linux/jq — the default command-line JSON processor; slice, filter, and reshape JSON in shell pipelines, with a full table of modern alternatives (jaq, gojq, yq, dasel, fq, gron/fastgron, jless, fx).
  • /sections/python/json — the Python stdlib encoder/decoder, plus when to reach for orjson or msgspec.
  • /sections/javascript/package-json — the canonical JSON-shaped manifest for the Node.js ecosystem.
  • /sections/claude-code/settings — a real-world JSONC configuration surface (Claude Code settings.json with comments).
  • /sections/prompting/structured-output — how LLM APIs use JSON Schema to constrain model output to a parseable shape.
  • /sections/osx/system_profiler and /sections/linux/ffprobe — tools whose -json flag exists specifically so the output can be piped into jq.
  • /sections/linux/xidel — XPath-style extraction over JSON when the document is too deeply nested for jq to feel ergonomic.
  • /sections/linux/az — Azure CLI; the --query flag is a built-in JMESPath engine, and JSON is the default output format. Real example of "JSON on the wire, filter at the call site" in a production CLI.
  • /sections/linux/gh — GitHub CLI; another cloud-CLI surface that exposes JSON via --json field1,field2 and a --jq flag for pipeline-friendly extraction.

Sources

References consulted while writing this concept page. Links open in a new tab.

  • RFC 8259 — The JavaScript Object Notation (JSON) Data Interchange Format — Authoritative IETF Internet Standard; source for the grammar summary, the UTF-8 requirement, and the interoperability guidance on numbers and duplicate keys.
  • RFC Editor: RFC 8259 metadata — Confirms RFC 8259's Internet Standard status (STD 90) and obsoletes-history for RFC 7159.
  • JSON5 — JSON for Humans — Reference for the JSON5 superset (trailing commas, single quotes, unquoted keys, hex, NaN/Infinity, comments) used to distinguish JSON / JSONC / JSON5.
  • JSONC specification — Reference for the minimal "JSON with comments" superset used by VS Code, tsconfig.json, and Claude Code settings.json; clarifies that trailing commas are not part of baseline JSONC.
  • JSON Schema 2020-12 draft — Current stable JSON Schema dialect; used to ground the "JSON Schema is the validation vocabulary that LLM tool-use APIs consume" framing.
  • JSON Lines (JSONL / NDJSON) — Specification for the line-delimited streaming variant; source for the \n-separated, UTF-8, single-value-per-line, .jsonl extension recommendations.
  • Wikipedia: JSON streaming — Cross-format overview (NDJSON, JSON Lines, concatenated JSON, length-prefixed JSON) used to position streaming alternatives.
  • jq 1.8 manual — Current jq language reference; source for pick, INDEX/IN/JOIN, trim/abs/toboolean, --raw-output0, and the code-point string-indexing change.
  • jq 1.7 release notes — Original landing notes for pick, the SQL-style join builtins, --raw-output0, JQ_COLORS, and NO_COLOR support.
  • jq 1.8.0 release notes — Source for the 1.8 additions (trim/ltrim/rtrim, abs, toboolean, skip) and the breaking move to code-point string indexing.
  • jaq — Rust jq clone — Performance-focused jq-language-compatible reimplementation; the canonical "faster jq in CI" choice.
  • gojq — pure-Go jq — Drop-in jq with YAML I/O and clearer error messages; ships as a single static binary.
  • yq (Mike Farah) — jq-syntax processor for YAML, JSON, XML, CSV, TOML, and properties; the de facto Kubernetes/Helm/GitHub-Actions editor.
  • fq — jq for binary formats — Applies the jq language to mp4, pcap, ELF, PNG, ZIP, and other binary formats.
  • gron and fastgron — Flatten JSON into greppable path = value; lines and back; the fastest way to discover the path you need before writing a jq filter.
  • JMESPath specification — Authoritative grammar for the query language embedded in the AWS and Azure CLIs (--query); source for the "projections, filter expressions, multi-select hashes, pipes" framing.