cheat sheet

json

Encode and decode JSON in Python with the stdlib json module. Covers dumps/loads, indent/sort_keys/separators, custom default= and JSONEncoder, object_hook decoding, JSONL streaming, and orjson/ujson/msgspec comparison.

updated 05-25-2026

json — Stdlib JSON Encoder/Decoder

What it is

json is Python's standard-library module for encoding Python objects to JSON strings and decoding JSON strings back to Python objects. It ships with every Python install and is the right tool for almost all JSON work. Reach for orjson or msgspec only when you have a measured performance need — they are 5–20× faster than stdlib json for large payloads but add a dependency.

Install

json is part of the standard library — no install step is needed. The faster third-party alternatives are optional.

bash

# Standard library — no install
python -c "import json; print(json.__version__)"

# Faster alternatives (optional)
pip install orjson      # Rust-backed, 5–20× faster
pip install ujson       # C-backed, ~3× faster
pip install msgspec     # validation + speed, schema-aware

Output:

text

2.0.9

Syntax

The two entry points are dumps/loads (string ↔ object) and dump/load (file ↔ object). Use the string forms in memory, the file forms when reading or writing files.

python

import json

json.dumps(obj)        # object → str
json.loads(s)          # str    → object
json.dump(obj, fp)     # object → file
json.load(fp)          # file   → object

Output: (none — declarative signatures)

Type mapping

Python types are mapped to JSON types as follows. Anything outside this table needs a custom encoder.

Python	JSON
`dict` (str keys)	object
`list`, `tuple`	array
`str`	string
`int`, `float`	number
`True`, `False`	`true`, `false`
`None`	`null`

Non-string dict keys are coerced to strings ({1: "a"} becomes {"1": "a"}). Types not on this list — datetime, Path, Decimal, set, bytes, dataclasses — raise TypeError unless you supply a default= callable.

`dumps`/`loads` — basics

dumps encodes a Python object to a JSON string; loads parses a string back. Both round-trip primitive types without surprises.

python

import json

obj = {"name": "Alice Dev", "age": 30, "tags": ["admin", "user"], "active": True}
s = json.dumps(obj)
print(s)
print(json.loads(s))
print(type(json.loads("3.14")))   # numbers come back as int/float

Output:

text

{"name": "Alice Dev", "age": 30, "tags": ["admin", "user"], "active": true}
{'name': 'Alice Dev', 'age': 30, 'tags': ['admin', 'user'], 'active': True}
<class 'float'>

Pretty printing — `indent`, `sort_keys`, `separators`, `ensure_ascii`

The formatting options on dumps control whitespace and ordering. The defaults produce compact ASCII-safe output; pass indent=2 for readable diffs, sort_keys=True for stable output, and ensure_ascii=False to keep non-ASCII characters literal.

python

import json

obj = {"name": "Alice Dev", "email": "alice@example.com", "city": "São Paulo"}

print(json.dumps(obj))                                    # default
print(json.dumps(obj, indent=2))                          # pretty
print(json.dumps(obj, indent=2, sort_keys=True))          # stable order
print(json.dumps(obj, ensure_ascii=False))                # keep é, ã, etc.
print(json.dumps(obj, separators=(",", ":")))             # most compact

Output:

text

{"name": "Alice Dev", "email": "alice@example.com", "city": "São Paulo"}
{
  "name": "Alice Dev",
  "email": "alice@example.com",
  "city": "São Paulo"
}
{
  "city": "São Paulo",
  "email": "alice@example.com",
  "name": "Alice Dev"
}
{"name": "Alice Dev", "email": "alice@example.com", "city": "São Paulo"}
{"name":"Alice Dev","email":"alice@example.com","city":"São Paulo"}

Argument	Purpose	Typical value
`indent`	Pretty-print with this many spaces per level	`None` (compact) or `2`
`sort_keys`	Sort dict keys alphabetically	`True` for diffable output
`separators`	Override `", "` and `": "`	`(",", ":")` for smallest
`ensure_ascii`	Escape non-ASCII as `\uXXXX`	`False` to keep UTF-8 literal
`allow_nan`	Allow `NaN`/`Infinity` (non-standard)	`False` for strict JSON
`default`	Callable for unsupported types	see next section
`cls`	Custom `JSONEncoder` subclass	see "Custom encoders"

Custom encoders — `default=` for unsupported types

Pass a default callable to dumps to handle types JSON doesn't know about (datetime, Path, Decimal, set, dataclasses, …). The callable receives the unencodable object and returns something JSON-serialisable; raise TypeError for anything you don't handle.

python

import json
from datetime import datetime, UTC
from pathlib import Path
from decimal import Decimal
from dataclasses import dataclass, asdict, is_dataclass

@dataclass
class User:
    name: str
    joined: datetime

def encode(obj):
    if isinstance(obj, datetime):
        return obj.isoformat()
    if isinstance(obj, Path):
        return str(obj)
    if isinstance(obj, Decimal):
        return str(obj)
    if isinstance(obj, set):
        return sorted(obj)
    if is_dataclass(obj):
        return asdict(obj)
    raise TypeError(f"cannot encode {type(obj).__name__}")

data = {
    "user": User("Alice Dev", datetime(2026, 5, 25, tzinfo=UTC)),
    "config": Path("/home/alice/.config"),
    "balance": Decimal("19.99"),
    "tags": {"admin", "user"},
}
print(json.dumps(data, default=encode, indent=2))

Output:

text

{
  "user": {
    "name": "Alice Dev",
    "joined": "2026-05-25T00:00:00+00:00"
  },
  "config": "/home/alice/.config",
  "balance": "19.99",
  "tags": ["admin", "user"]
}

Class-based encoders — `cls=JSONEncoder`

For project-wide reuse, subclass json.JSONEncoder and override default. The class form is interchangeable with the default= callable but composes better when you want to combine multiple type handlers.

python

import json
from datetime import date, datetime
from pathlib import Path
from decimal import Decimal
from uuid import UUID

class RichJSONEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (datetime, date)):
            return obj.isoformat()
        if isinstance(obj, Path):
            return str(obj)
        if isinstance(obj, Decimal):
            return float(obj) if obj % 1 else int(obj)
        if isinstance(obj, UUID):
            return str(obj)
        if isinstance(obj, set):
            return sorted(obj)
        return super().default(obj)   # raises TypeError

print(json.dumps({"d": date(2026, 5, 25), "p": Path("/tmp")},
                 cls=RichJSONEncoder))

Output:

text

{"d": "2026-05-25", "p": "/tmp"}

Decoding — `object_hook` and `parse_*` callbacks

object_hook is called with every decoded JSON object (Python dict), letting you transform {"__type__": "datetime", "value": "..."} markers back into real datetime instances. Pair it with a custom encoder to round-trip non-JSON types.

python

import json
from datetime import datetime

def encode(obj):
    if isinstance(obj, datetime):
        return {"__type__": "datetime", "value": obj.isoformat()}
    raise TypeError(type(obj).__name__)

def decode_hook(d):
    if d.get("__type__") == "datetime":
        return datetime.fromisoformat(d["value"])
    return d

s = json.dumps({"created": datetime(2026, 5, 25, 14, 30)}, default=encode)
print(s)
parsed = json.loads(s, object_hook=decode_hook)
print(parsed, type(parsed["created"]).__name__)

Output:

text

{"created": {"__type__": "datetime", "value": "2026-05-25T14:30:00"}}
{'created': datetime.datetime(2026, 5, 25, 14, 30)} datetime

loads also exposes parse_float, parse_int, and parse_constant for fine-grained control over numbers and NaN/Infinity. The common use is parse_float=Decimal to keep financial values exact:

python

import json
from decimal import Decimal

balance = json.loads('{"amount": 0.1}', parse_float=Decimal)
print(balance, type(balance["amount"]).__name__)

Output:

text

{'amount': Decimal('0.1')} Decimal

File I/O — `dump` and `load`

dump/load work on any file-like object. They are not faster than dumps/loads + reading the file — they exist for ergonomics. Use them when you have a single document; for streaming JSON Lines see the next section.

python

import json
from pathlib import Path

p = Path("user.json")
p.write_text("")

with p.open("w") as f:
    json.dump({"name": "Alice Dev", "active": True}, f, indent=2)

with p.open() as f:
    print(json.load(f))

Output:

text

{'name': 'Alice Dev', 'active': True}

JSON Lines (JSONL) — streaming records

JSON Lines (.jsonl, .ndjson) is a format where each line is a self-contained JSON object. It's the standard for log files, ML datasets, and append-only event streams because it can be read/written one record at a time without holding the whole file in memory.

python

import json
from pathlib import Path

records = [
    {"id": 1, "name": "Alice Dev"},
    {"id": 2, "name": "Bob Dev"},
    {"id": 3, "name": "Carol Dev"},
]

p = Path("users.jsonl")
with p.open("w") as f:
    for r in records:
        f.write(json.dumps(r) + "\n")

# Read back, one record at a time
with p.open() as f:
    for line in f:
        record = json.loads(line)
        print(record["id"], record["name"])

Output:

text

1 Alice Dev
2 Bob Dev
3 Carol Dev

A reusable helper makes this even tidier:

python

def write_jsonl(path, records, *, default=None):
    with open(path, "w") as f:
        for r in records:
            f.write(json.dumps(r, default=default) + "\n")

def read_jsonl(path):
    with open(path) as f:
        for line in f:
            line = line.strip()
            if line:
                yield json.loads(line)

`json.tool` — pretty-print from the CLI

The stdlib ships a CLI wrapper that pretty-prints JSON. It's a one-line replacement for jq . when you don't have jq installed.

bash

echo '{"name":"Alice","tags":["a","b"]}' | python -m json.tool
python -m json.tool --indent 4 raw.json formatted.json
python -m json.tool --sort-keys raw.json

Output:

text

{
    "name": "Alice",
    "tags": [
        "a",
        "b"
    ]
}

Use --no-ensure-ascii to keep UTF-8 literal, --compact to strip whitespace, and pass two filenames for in-place reformatting.

Round-tripping a dataclass through JSON

Dataclasses don't serialise natively, but dataclasses.asdict + a custom default handles the common case. For nested dataclasses with datetime fields, the decoder needs to know what class to rebuild.

python

import json
from dataclasses import dataclass, asdict, field, fields, is_dataclass
from datetime import datetime, UTC
from typing import get_type_hints

@dataclass
class Post:
    id: int
    title: str
    published: datetime
    tags: list[str] = field(default_factory=list)

def encode(obj):
    if is_dataclass(obj):
        return asdict(obj)
    if isinstance(obj, datetime):
        return obj.isoformat()
    raise TypeError(type(obj).__name__)

def to_dataclass(cls, data: dict):
    """Rebuild a dataclass instance, parsing datetime fields by annotation."""
    hints = get_type_hints(cls)
    kwargs = {}
    for f in fields(cls):
        v = data[f.name]
        if hints[f.name] is datetime and isinstance(v, str):
            v = datetime.fromisoformat(v)
        kwargs[f.name] = v
    return cls(**kwargs)

post = Post(1, "Hello", datetime(2026, 5, 25, tzinfo=UTC), tags=["intro"])
s = json.dumps(post, default=encode, indent=2)
print(s)

restored = to_dataclass(Post, json.loads(s))
print(restored)
print(restored == post)

Output:

text

{
  "id": 1,
  "title": "Hello",
  "published": "2026-05-25T00:00:00+00:00",
  "tags": [
    "intro"
  ]
}
Post(id=1, title='Hello', published=datetime.datetime(2026, 5, 25, 0, 0, tzinfo=datetime.timezone.utc), tags=['intro'])
True

For complex nested structures, prefer Pydantic — model.model_dump_json() and Model.model_validate_json() handle every type automatically.

Comparison with `orjson`, `ujson`, `msgspec`

The stdlib json module is pure-Python and the slowest of the family — but it's also the only one in the standard library, supports default= callables, and has zero install. The others are drop-in replacements with different trade-offs.

Library	Speed	API surface	Notable
`json` (stdlib)	1×	full `default=`, `cls`, `object_hook`	always available
`orjson`	~10–20×	`dumps`/`loads` only, returns `bytes`	best speed, handles `datetime`/`UUID` natively
`ujson`	~3–5×	matches stdlib closely	older, less feature-rich than `orjson`
`msgspec`	~10–30×	schema-aware (`Struct`, type-validated)	doubles as a Pydantic alternative

python

# orjson — bytes in/out, no default needed for datetime/UUID
import orjson
from datetime import datetime, UTC

raw = orjson.dumps({"now": datetime.now(UTC)})    # → bytes
print(raw)
print(orjson.loads(raw))

Output:

text

b'{"now":"2026-05-25T14:30:00+00:00"}'
{'now': '2026-05-25T14:30:00+00:00'}

orjson.dumps returns bytes, not str. Use .decode() if you need a string, or pass the bytes directly to Path.write_bytes / socket.send.

Rule of thumb: use stdlib json for almost everything. Switch to orjson if you serialise multi-MB payloads in a hot path. Switch to msgspec if you also want validation and want to avoid Pydantic's overhead.

Common pitfalls

Non-string keys are coerced — json.dumps({1: "a"}) produces {"1": "a"} silently. The reverse trip gives you str keys. Convert explicitly if it matters.
NaN and Infinity are not valid JSON — but stdlib accepts them by default (allow_nan=True). Set allow_nan=False to enforce strict RFC 8259, or sanitise with math.isfinite().
TypeError: Object of type X is not JSON serializable — means default= did not cover that type. Add a branch to your encoder.
Loss of int precision — JavaScript clients lose precision above 2^53; if your consumer is JS, serialise large integers as strings.
json.dumps(d, sort_keys=True) is not stable across dict types — OrderedDict ignores sort_keys; with plain dicts the output is deterministic.
ensure_ascii=True is the default — non-ASCII text becomes \uXXXX. Set ensure_ascii=False for human-readable output (and remember to write the file as UTF-8).
load/dump are not faster than loads/dumps — they just save you the file-read step. Don't switch for performance reasons.
object_hook runs on every nested dict — including dicts you don't want to transform. Use a __type__ marker or check structure inside the hook.
JSONL: forget the trailing newline — json.dumps(record) does not append \n. Add it yourself.
json.loads(bytes) works (3.6+) — but only for UTF-8. Pass .decode("utf-16") first for other encodings.
Decimal round-trip via float is lossy — use parse_float=Decimal on load and str(d) in your encoder.
pathlib.Path is not serialisable — add it to your encoder; this trips up scripts that pass config dicts around.

Real-world recipes

Atomic write to disk

Writing JSON to a file directly risks leaving a corrupt half-written file if the process is killed. Write to a temp file and rename, which is atomic on POSIX and Windows.

python

import json
import os
from pathlib import Path

def write_json_atomic(path: Path, data, **kwargs):
    tmp = path.with_suffix(path.suffix + ".tmp")
    with tmp.open("w") as f:
        json.dump(data, f, **kwargs)
        f.flush()
        os.fsync(f.fileno())
    tmp.replace(path)   # atomic rename

write_json_atomic(Path("config.json"),
                  {"host": "myhost", "port": 9000},
                  indent=2, sort_keys=True)
print(Path("config.json").read_text())

Output:

text

{
  "host": "myhost",
  "port": 9000
}

Streaming a large JSONL file with progress

Process a million-record JSONL file without holding it in memory. Use enumerate for progress and yield so callers can chain transformations.

python

import json
from pathlib import Path

def stream_jsonl(path: Path):
    with path.open() as f:
        for line_no, line in enumerate(f, 1):
            line = line.strip()
            if not line:
                continue
            try:
                yield line_no, json.loads(line)
            except json.JSONDecodeError as e:
                print(f"line {line_no}: bad JSON — {e}")

# Build a sample file
Path("events.jsonl").write_text("\n".join([
    '{"id": 1, "type": "click"}',
    '{"id": 2, "type": "view"}',
    'not-json',
    '{"id": 3, "type": "click"}',
]))

clicks = 0
for n, record in stream_jsonl(Path("events.jsonl")):
    if record.get("type") == "click":
        clicks += 1
print(f"clicks: {clicks}")

Output:

text

line 3: bad JSON — Expecting value: line 1 column 1 (char 0)
clicks: 2

Filter a large JSON file with `jq`-style queries

For a few keys, manual indexing is fine; for ad-hoc exploration of a huge file, shell out to jq. Use json.tool as a fallback when jq is not installed.

bash

# With jq (best)
jq '.users[] | select(.active) | .name' users.json

# Stdlib equivalent
python - <<'PY'
import json, pathlib
data = json.loads(pathlib.Path("users.json").read_text())
for u in data["users"]:
    if u.get("active"):
        print(u["name"])
PY

Output:

text

"Alice Dev"
"Carol Dev"

Merge multiple JSON config files with overrides

A pattern that mirrors argparse + env-var fallbacks: load a base config, layer environment-specific overrides on top. dict.update() handles flat merges; for nested merges, recurse.

python

import json
from pathlib import Path

def deep_merge(base: dict, override: dict) -> dict:
    out = dict(base)
    for k, v in override.items():
        if isinstance(v, dict) and isinstance(out.get(k), dict):
            out[k] = deep_merge(out[k], v)
        else:
            out[k] = v
    return out

Path("base.json").write_text(json.dumps({
    "host": "localhost", "port": 8080,
    "logging": {"level": "INFO", "file": "/var/log/app.log"},
}))
Path("prod.json").write_text(json.dumps({
    "host": "myhost",
    "logging": {"level": "WARNING"},
}))

base = json.loads(Path("base.json").read_text())
override = json.loads(Path("prod.json").read_text())
print(json.dumps(deep_merge(base, override), indent=2))

Output:

text

{
  "host": "myhost",
  "port": 8080,
  "logging": {
    "level": "WARNING",
    "file": "/var/log/app.log"
  }
}

REST API client that auto-encodes datetimes and UUIDs

A small helper that adds custom encoding to every requests call, so you never serialise raw datetime again.

python

import json
from datetime import datetime, UTC
from uuid import UUID, uuid4

class APIEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, datetime):
            return obj.isoformat()
        if isinstance(obj, UUID):
            return str(obj)
        return super().default(obj)

def api_post(url: str, payload: dict) -> str:
    body = json.dumps(payload, cls=APIEncoder)
    # In real code:
    # return requests.post(url, data=body,
    #                      headers={"Content-Type": "application/json"}).text
    return body

print(api_post("https://api.example.com/events", {
    "id": UUID("12345678-1234-5678-1234-567812345678"),
    "user": "alicedev",
    "at": datetime(2026, 5, 25, 14, 30, tzinfo=UTC),
}))

Output:

text

{"id": "12345678-1234-5678-1234-567812345678", "user": "alicedev", "at": "2026-05-25T14:30:00+00:00"}

Validate a JSON payload's shape

Quick structural validation without pulling in Pydantic or jsonschema — useful for one-off scripts.

python

import json

def require(obj, *keys):
    missing = [k for k in keys if k not in obj]
    if missing:
        raise ValueError(f"missing keys: {missing}")

payload = json.loads('{"name": "Alice Dev", "email": "alice@example.com"}')
try:
    require(payload, "name", "email", "age")
except ValueError as e:
    print(e)

Output:

text

missing keys: ['age']

For anything more complex than shape checks (types, ranges, regex), use Pydantic: User.model_validate_json(raw_bytes) does parse + validation in a single call and gives detailed errors.

json — Stdlib JSON Encoder/Decoder

What it is

Install

Syntax

Type mapping

dumps/loads — basics

Pretty printing — indent, sort_keys, separators, ensure_ascii

Custom encoders — default= for unsupported types

Class-based encoders — cls=JSONEncoder

Decoding — object_hook and parse_* callbacks

File I/O — dump and load

JSON Lines (JSONL) — streaming records

json.tool — pretty-print from the CLI

Round-tripping a dataclass through JSON

Comparison with orjson, ujson, msgspec

Common pitfalls

Real-world recipes

Atomic write to disk

Streaming a large JSONL file with progress

Filter a large JSON file with jq-style queries

Merge multiple JSON config files with overrides

REST API client that auto-encodes datetimes and UUIDs

Validate a JSON payload's shape

`dumps`/`loads` — basics

Pretty printing — `indent`, `sort_keys`, `separators`, `ensure_ascii`

Custom encoders — `default=` for unsupported types

Class-based encoders — `cls=JSONEncoder`

Decoding — `object_hook` and `parse_*` callbacks

File I/O — `dump` and `load`

`json.tool` — pretty-print from the CLI

Comparison with `orjson`, `ujson`, `msgspec`

Filter a large JSON file with `jq`-style queries