cheat sheet

json

Encode and decode JSON in Python with the stdlib json module. Covers dumps/loads, indent/sort_keys/separators, custom default= and JSONEncoder, object_hook decoding, JSONL streaming, and orjson/ujson/msgspec comparison.

json — Stdlib JSON Encoder/Decoder

What it is

json is Python's standard-library module for encoding Python objects to JSON strings and decoding JSON strings back to Python objects. It ships with every Python install and is the right tool for almost all JSON work. Reach for orjson or msgspec only when you have a measured performance need — they are 5–20× faster than stdlib json for large payloads but add a dependency.

Install

json is part of the standard library — no install step is needed. The faster third-party alternatives are optional.

bash
# Standard library — no install
python -c "import json; print(json.__version__)"

# Faster alternatives (optional)
pip install orjson      # Rust-backed, 5–20× faster
pip install ujson       # C-backed, ~3× faster
pip install msgspec     # validation + speed, schema-aware

Output:

text
2.0.9

Syntax

The two entry points are dumps/loads (string ↔ object) and dump/load (file ↔ object). Use the string forms in memory, the file forms when reading or writing files.

python
import json

json.dumps(obj)        # object → str
json.loads(s)          # str    → object
json.dump(obj, fp)     # object → file
json.load(fp)          # file   → object

Output: (none — declarative signatures)

Type mapping

Python types are mapped to JSON types as follows. Anything outside this table needs a custom encoder.

PythonJSON
dict (str keys)object
list, tuplearray
strstring
int, floatnumber
True, Falsetrue, false
Nonenull

Non-string dict keys are coerced to strings ({1: "a"} becomes {"1": "a"}). Types not on this list — datetime, Path, Decimal, set, bytes, dataclasses — raise TypeError unless you supply a default= callable.

dumps/loads — basics

dumps encodes a Python object to a JSON string; loads parses a string back. Both round-trip primitive types without surprises.

python
import json

obj = {"name": "Alice Dev", "age": 30, "tags": ["admin", "user"], "active": True}
s = json.dumps(obj)
print(s)
print(json.loads(s))
print(type(json.loads("3.14")))   # numbers come back as int/float

Output:

text
{"name": "Alice Dev", "age": 30, "tags": ["admin", "user"], "active": true}
{'name': 'Alice Dev', 'age': 30, 'tags': ['admin', 'user'], 'active': True}
<class 'float'>

Pretty printing — indent, sort_keys, separators, ensure_ascii

The formatting options on dumps control whitespace and ordering. The defaults produce compact ASCII-safe output; pass indent=2 for readable diffs, sort_keys=True for stable output, and ensure_ascii=False to keep non-ASCII characters literal.

python
import json

obj = {"name": "Alice Dev", "email": "alice@example.com", "city": "São Paulo"}

print(json.dumps(obj))                                    # default
print(json.dumps(obj, indent=2))                          # pretty
print(json.dumps(obj, indent=2, sort_keys=True))          # stable order
print(json.dumps(obj, ensure_ascii=False))                # keep é, ã, etc.
print(json.dumps(obj, separators=(",", ":")))             # most compact

Output:

text
{"name": "Alice Dev", "email": "alice@example.com", "city": "São Paulo"}
{
  "name": "Alice Dev",
  "email": "alice@example.com",
  "city": "São Paulo"
}
{
  "city": "São Paulo",
  "email": "alice@example.com",
  "name": "Alice Dev"
}
{"name": "Alice Dev", "email": "alice@example.com", "city": "São Paulo"}
{"name":"Alice Dev","email":"alice@example.com","city":"São Paulo"}
ArgumentPurposeTypical value
indentPretty-print with this many spaces per levelNone (compact) or 2
sort_keysSort dict keys alphabeticallyTrue for diffable output
separatorsOverride ", " and ": "(",", ":") for smallest
ensure_asciiEscape non-ASCII as \uXXXXFalse to keep UTF-8 literal
allow_nanAllow NaN/Infinity (non-standard)False for strict JSON
defaultCallable for unsupported typessee next section
clsCustom JSONEncoder subclasssee "Custom encoders"

Custom encoders — default= for unsupported types

Pass a default callable to dumps to handle types JSON doesn't know about (datetime, Path, Decimal, set, dataclasses, …). The callable receives the unencodable object and returns something JSON-serialisable; raise TypeError for anything you don't handle.

python
import json
from datetime import datetime, UTC
from pathlib import Path
from decimal import Decimal
from dataclasses import dataclass, asdict, is_dataclass

@dataclass
class User:
    name: str
    joined: datetime

def encode(obj):
    if isinstance(obj, datetime):
        return obj.isoformat()
    if isinstance(obj, Path):
        return str(obj)
    if isinstance(obj, Decimal):
        return str(obj)
    if isinstance(obj, set):
        return sorted(obj)
    if is_dataclass(obj):
        return asdict(obj)
    raise TypeError(f"cannot encode {type(obj).__name__}")

data = {
    "user": User("Alice Dev", datetime(2026, 5, 25, tzinfo=UTC)),
    "config": Path("/home/alice/.config"),
    "balance": Decimal("19.99"),
    "tags": {"admin", "user"},
}
print(json.dumps(data, default=encode, indent=2))

Output:

text
{
  "user": {
    "name": "Alice Dev",
    "joined": "2026-05-25T00:00:00+00:00"
  },
  "config": "/home/alice/.config",
  "balance": "19.99",
  "tags": ["admin", "user"]
}

Class-based encoders — cls=JSONEncoder

For project-wide reuse, subclass json.JSONEncoder and override default. The class form is interchangeable with the default= callable but composes better when you want to combine multiple type handlers.

python
import json
from datetime import date, datetime
from pathlib import Path
from decimal import Decimal
from uuid import UUID

class RichJSONEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (datetime, date)):
            return obj.isoformat()
        if isinstance(obj, Path):
            return str(obj)
        if isinstance(obj, Decimal):
            return float(obj) if obj % 1 else int(obj)
        if isinstance(obj, UUID):
            return str(obj)
        if isinstance(obj, set):
            return sorted(obj)
        return super().default(obj)   # raises TypeError

print(json.dumps({"d": date(2026, 5, 25), "p": Path("/tmp")},
                 cls=RichJSONEncoder))

Output:

text
{"d": "2026-05-25", "p": "/tmp"}

Decoding — object_hook and parse_* callbacks

object_hook is called with every decoded JSON object (Python dict), letting you transform {"__type__": "datetime", "value": "..."} markers back into real datetime instances. Pair it with a custom encoder to round-trip non-JSON types.

python
import json
from datetime import datetime

def encode(obj):
    if isinstance(obj, datetime):
        return {"__type__": "datetime", "value": obj.isoformat()}
    raise TypeError(type(obj).__name__)

def decode_hook(d):
    if d.get("__type__") == "datetime":
        return datetime.fromisoformat(d["value"])
    return d

s = json.dumps({"created": datetime(2026, 5, 25, 14, 30)}, default=encode)
print(s)
parsed = json.loads(s, object_hook=decode_hook)
print(parsed, type(parsed["created"]).__name__)

Output:

text
{"created": {"__type__": "datetime", "value": "2026-05-25T14:30:00"}}
{'created': datetime.datetime(2026, 5, 25, 14, 30)} datetime

loads also exposes parse_float, parse_int, and parse_constant for fine-grained control over numbers and NaN/Infinity. The common use is parse_float=Decimal to keep financial values exact:

python
import json
from decimal import Decimal

balance = json.loads('{"amount": 0.1}', parse_float=Decimal)
print(balance, type(balance["amount"]).__name__)

Output:

text
{'amount': Decimal('0.1')} Decimal

File I/O — dump and load

dump/load work on any file-like object. They are not faster than dumps/loads + reading the file — they exist for ergonomics. Use them when you have a single document; for streaming JSON Lines see the next section.

python
import json
from pathlib import Path

p = Path("user.json")
p.write_text("")

with p.open("w") as f:
    json.dump({"name": "Alice Dev", "active": True}, f, indent=2)

with p.open() as f:
    print(json.load(f))

Output:

text
{'name': 'Alice Dev', 'active': True}

JSON Lines (JSONL) — streaming records

JSON Lines (.jsonl, .ndjson) is a format where each line is a self-contained JSON object. It's the standard for log files, ML datasets, and append-only event streams because it can be read/written one record at a time without holding the whole file in memory.

python
import json
from pathlib import Path

records = [
    {"id": 1, "name": "Alice Dev"},
    {"id": 2, "name": "Bob Dev"},
    {"id": 3, "name": "Carol Dev"},
]

p = Path("users.jsonl")
with p.open("w") as f:
    for r in records:
        f.write(json.dumps(r) + "\n")

# Read back, one record at a time
with p.open() as f:
    for line in f:
        record = json.loads(line)
        print(record["id"], record["name"])

Output:

text
1 Alice Dev
2 Bob Dev
3 Carol Dev

A reusable helper makes this even tidier:

python
def write_jsonl(path, records, *, default=None):
    with open(path, "w") as f:
        for r in records:
            f.write(json.dumps(r, default=default) + "\n")

def read_jsonl(path):
    with open(path) as f:
        for line in f:
            line = line.strip()
            if line:
                yield json.loads(line)

json.tool — pretty-print from the CLI

The stdlib ships a CLI wrapper that pretty-prints JSON. It's a one-line replacement for jq . when you don't have jq installed.

bash
echo '{"name":"Alice","tags":["a","b"]}' | python -m json.tool
python -m json.tool --indent 4 raw.json formatted.json
python -m json.tool --sort-keys raw.json

Output:

text
{
    "name": "Alice",
    "tags": [
        "a",
        "b"
    ]
}

Use --no-ensure-ascii to keep UTF-8 literal, --compact to strip whitespace, and pass two filenames for in-place reformatting.

Round-tripping a dataclass through JSON

Dataclasses don't serialise natively, but dataclasses.asdict + a custom default handles the common case. For nested dataclasses with datetime fields, the decoder needs to know what class to rebuild.

python
import json
from dataclasses import dataclass, asdict, field, fields, is_dataclass
from datetime import datetime, UTC
from typing import get_type_hints

@dataclass
class Post:
    id: int
    title: str
    published: datetime
    tags: list[str] = field(default_factory=list)

def encode(obj):
    if is_dataclass(obj):
        return asdict(obj)
    if isinstance(obj, datetime):
        return obj.isoformat()
    raise TypeError(type(obj).__name__)

def to_dataclass(cls, data: dict):
    """Rebuild a dataclass instance, parsing datetime fields by annotation."""
    hints = get_type_hints(cls)
    kwargs = {}
    for f in fields(cls):
        v = data[f.name]
        if hints[f.name] is datetime and isinstance(v, str):
            v = datetime.fromisoformat(v)
        kwargs[f.name] = v
    return cls(**kwargs)

post = Post(1, "Hello", datetime(2026, 5, 25, tzinfo=UTC), tags=["intro"])
s = json.dumps(post, default=encode, indent=2)
print(s)

restored = to_dataclass(Post, json.loads(s))
print(restored)
print(restored == post)

Output:

text
{
  "id": 1,
  "title": "Hello",
  "published": "2026-05-25T00:00:00+00:00",
  "tags": [
    "intro"
  ]
}
Post(id=1, title='Hello', published=datetime.datetime(2026, 5, 25, 0, 0, tzinfo=datetime.timezone.utc), tags=['intro'])
True

For complex nested structures, prefer Pydanticmodel.model_dump_json() and Model.model_validate_json() handle every type automatically.

Comparison with orjson, ujson, msgspec

The stdlib json module is pure-Python and the slowest of the family — but it's also the only one in the standard library, supports default= callables, and has zero install. The others are drop-in replacements with different trade-offs.

LibrarySpeedAPI surfaceNotable
json (stdlib)full default=, cls, object_hookalways available
orjson~10–20×dumps/loads only, returns bytesbest speed, handles datetime/UUID natively
ujson~3–5×matches stdlib closelyolder, less feature-rich than orjson
msgspec~10–30×schema-aware (Struct, type-validated)doubles as a Pydantic alternative
python
# orjson — bytes in/out, no default needed for datetime/UUID
import orjson
from datetime import datetime, UTC

raw = orjson.dumps({"now": datetime.now(UTC)})    # → bytes
print(raw)
print(orjson.loads(raw))

Output:

text
b'{"now":"2026-05-25T14:30:00+00:00"}'
{'now': '2026-05-25T14:30:00+00:00'}

orjson.dumps returns bytes, not str. Use .decode() if you need a string, or pass the bytes directly to Path.write_bytes / socket.send.

Rule of thumb: use stdlib json for almost everything. Switch to orjson if you serialise multi-MB payloads in a hot path. Switch to msgspec if you also want validation and want to avoid Pydantic's overhead.

Common pitfalls

  1. Non-string keys are coercedjson.dumps({1: "a"}) produces {"1": "a"} silently. The reverse trip gives you str keys. Convert explicitly if it matters.
  2. NaN and Infinity are not valid JSON — but stdlib accepts them by default (allow_nan=True). Set allow_nan=False to enforce strict RFC 8259, or sanitise with math.isfinite().
  3. TypeError: Object of type X is not JSON serializable — means default= did not cover that type. Add a branch to your encoder.
  4. Loss of int precision — JavaScript clients lose precision above 2^53; if your consumer is JS, serialise large integers as strings.
  5. json.dumps(d, sort_keys=True) is not stable across dict typesOrderedDict ignores sort_keys; with plain dicts the output is deterministic.
  6. ensure_ascii=True is the default — non-ASCII text becomes \uXXXX. Set ensure_ascii=False for human-readable output (and remember to write the file as UTF-8).
  7. load/dump are not faster than loads/dumps — they just save you the file-read step. Don't switch for performance reasons.
  8. object_hook runs on every nested dict — including dicts you don't want to transform. Use a __type__ marker or check structure inside the hook.
  9. JSONL: forget the trailing newlinejson.dumps(record) does not append \n. Add it yourself.
  10. json.loads(bytes) works (3.6+) — but only for UTF-8. Pass .decode("utf-16") first for other encodings.
  11. Decimal round-trip via float is lossy — use parse_float=Decimal on load and str(d) in your encoder.
  12. pathlib.Path is not serialisable — add it to your encoder; this trips up scripts that pass config dicts around.

Real-world recipes

Atomic write to disk

Writing JSON to a file directly risks leaving a corrupt half-written file if the process is killed. Write to a temp file and rename, which is atomic on POSIX and Windows.

python
import json
import os
from pathlib import Path

def write_json_atomic(path: Path, data, **kwargs):
    tmp = path.with_suffix(path.suffix + ".tmp")
    with tmp.open("w") as f:
        json.dump(data, f, **kwargs)
        f.flush()
        os.fsync(f.fileno())
    tmp.replace(path)   # atomic rename

write_json_atomic(Path("config.json"),
                  {"host": "myhost", "port": 9000},
                  indent=2, sort_keys=True)
print(Path("config.json").read_text())

Output:

text
{
  "host": "myhost",
  "port": 9000
}

Streaming a large JSONL file with progress

Process a million-record JSONL file without holding it in memory. Use enumerate for progress and yield so callers can chain transformations.

python
import json
from pathlib import Path

def stream_jsonl(path: Path):
    with path.open() as f:
        for line_no, line in enumerate(f, 1):
            line = line.strip()
            if not line:
                continue
            try:
                yield line_no, json.loads(line)
            except json.JSONDecodeError as e:
                print(f"line {line_no}: bad JSON — {e}")

# Build a sample file
Path("events.jsonl").write_text("\n".join([
    '{"id": 1, "type": "click"}',
    '{"id": 2, "type": "view"}',
    'not-json',
    '{"id": 3, "type": "click"}',
]))

clicks = 0
for n, record in stream_jsonl(Path("events.jsonl")):
    if record.get("type") == "click":
        clicks += 1
print(f"clicks: {clicks}")

Output:

text
line 3: bad JSON — Expecting value: line 1 column 1 (char 0)
clicks: 2

Filter a large JSON file with jq-style queries

For a few keys, manual indexing is fine; for ad-hoc exploration of a huge file, shell out to jq. Use json.tool as a fallback when jq is not installed.

bash
# With jq (best)
jq '.users[] | select(.active) | .name' users.json

# Stdlib equivalent
python - <<'PY'
import json, pathlib
data = json.loads(pathlib.Path("users.json").read_text())
for u in data["users"]:
    if u.get("active"):
        print(u["name"])
PY

Output:

text
"Alice Dev"
"Carol Dev"

Merge multiple JSON config files with overrides

A pattern that mirrors argparse + env-var fallbacks: load a base config, layer environment-specific overrides on top. dict.update() handles flat merges; for nested merges, recurse.

python
import json
from pathlib import Path

def deep_merge(base: dict, override: dict) -> dict:
    out = dict(base)
    for k, v in override.items():
        if isinstance(v, dict) and isinstance(out.get(k), dict):
            out[k] = deep_merge(out[k], v)
        else:
            out[k] = v
    return out

Path("base.json").write_text(json.dumps({
    "host": "localhost", "port": 8080,
    "logging": {"level": "INFO", "file": "/var/log/app.log"},
}))
Path("prod.json").write_text(json.dumps({
    "host": "myhost",
    "logging": {"level": "WARNING"},
}))

base = json.loads(Path("base.json").read_text())
override = json.loads(Path("prod.json").read_text())
print(json.dumps(deep_merge(base, override), indent=2))

Output:

text
{
  "host": "myhost",
  "port": 8080,
  "logging": {
    "level": "WARNING",
    "file": "/var/log/app.log"
  }
}

REST API client that auto-encodes datetimes and UUIDs

A small helper that adds custom encoding to every requests call, so you never serialise raw datetime again.

python
import json
from datetime import datetime, UTC
from uuid import UUID, uuid4

class APIEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, datetime):
            return obj.isoformat()
        if isinstance(obj, UUID):
            return str(obj)
        return super().default(obj)

def api_post(url: str, payload: dict) -> str:
    body = json.dumps(payload, cls=APIEncoder)
    # In real code:
    # return requests.post(url, data=body,
    #                      headers={"Content-Type": "application/json"}).text
    return body

print(api_post("https://api.example.com/events", {
    "id": UUID("12345678-1234-5678-1234-567812345678"),
    "user": "alicedev",
    "at": datetime(2026, 5, 25, 14, 30, tzinfo=UTC),
}))

Output:

text
{"id": "12345678-1234-5678-1234-567812345678", "user": "alicedev", "at": "2026-05-25T14:30:00+00:00"}

Validate a JSON payload's shape

Quick structural validation without pulling in Pydantic or jsonschema — useful for one-off scripts.

python
import json

def require(obj, *keys):
    missing = [k for k in keys if k not in obj]
    if missing:
        raise ValueError(f"missing keys: {missing}")

payload = json.loads('{"name": "Alice Dev", "email": "alice@example.com"}')
try:
    require(payload, "name", "email", "age")
except ValueError as e:
    print(e)

Output:

text
missing keys: ['age']

For anything more complex than shape checks (types, ranges, regex), use Pydantic: User.model_validate_json(raw_bytes) does parse + validation in a single call and gives detailed errors.