cheat sheet
json
Encode and decode JSON in Python with the stdlib json module. Covers dumps/loads, indent/sort_keys/separators, custom default= and JSONEncoder, object_hook decoding, JSONL streaming, and orjson/ujson/msgspec comparison.
json — Stdlib JSON Encoder/Decoder
What it is
json is Python's standard-library module for encoding Python objects to JSON strings and decoding JSON strings back to Python objects. It ships with every Python install and is the right tool for almost all JSON work. Reach for orjson or msgspec only when you have a measured performance need — they are 5–20× faster than stdlib json for large payloads but add a dependency.
Install
json is part of the standard library — no install step is needed. The faster third-party alternatives are optional.
# Standard library — no install
python -c "import json; print(json.__version__)"
# Faster alternatives (optional)
pip install orjson # Rust-backed, 5–20× faster
pip install ujson # C-backed, ~3× faster
pip install msgspec # validation + speed, schema-aware
Output:
2.0.9
Syntax
The two entry points are dumps/loads (string ↔ object) and dump/load (file ↔ object). Use the string forms in memory, the file forms when reading or writing files.
import json
json.dumps(obj) # object → str
json.loads(s) # str → object
json.dump(obj, fp) # object → file
json.load(fp) # file → object
Output: (none — declarative signatures)
Type mapping
Python types are mapped to JSON types as follows. Anything outside this table needs a custom encoder.
| Python | JSON |
|---|---|
dict (str keys) | object |
list, tuple | array |
str | string |
int, float | number |
True, False | true, false |
None | null |
Non-string dict keys are coerced to strings ({1: "a"} becomes {"1": "a"}). Types not on this list — datetime, Path, Decimal, set, bytes, dataclasses — raise TypeError unless you supply a default= callable.
dumps/loads — basics
dumps encodes a Python object to a JSON string; loads parses a string back. Both round-trip primitive types without surprises.
import json
obj = {"name": "Alice Dev", "age": 30, "tags": ["admin", "user"], "active": True}
s = json.dumps(obj)
print(s)
print(json.loads(s))
print(type(json.loads("3.14"))) # numbers come back as int/float
Output:
{"name": "Alice Dev", "age": 30, "tags": ["admin", "user"], "active": true}
{'name': 'Alice Dev', 'age': 30, 'tags': ['admin', 'user'], 'active': True}
<class 'float'>
Pretty printing — indent, sort_keys, separators, ensure_ascii
The formatting options on dumps control whitespace and ordering. The defaults produce compact ASCII-safe output; pass indent=2 for readable diffs, sort_keys=True for stable output, and ensure_ascii=False to keep non-ASCII characters literal.
import json
obj = {"name": "Alice Dev", "email": "alice@example.com", "city": "São Paulo"}
print(json.dumps(obj)) # default
print(json.dumps(obj, indent=2)) # pretty
print(json.dumps(obj, indent=2, sort_keys=True)) # stable order
print(json.dumps(obj, ensure_ascii=False)) # keep é, ã, etc.
print(json.dumps(obj, separators=(",", ":"))) # most compact
Output:
{"name": "Alice Dev", "email": "alice@example.com", "city": "São Paulo"}
{
"name": "Alice Dev",
"email": "alice@example.com",
"city": "São Paulo"
}
{
"city": "São Paulo",
"email": "alice@example.com",
"name": "Alice Dev"
}
{"name": "Alice Dev", "email": "alice@example.com", "city": "São Paulo"}
{"name":"Alice Dev","email":"alice@example.com","city":"São Paulo"}
| Argument | Purpose | Typical value |
|---|---|---|
indent | Pretty-print with this many spaces per level | None (compact) or 2 |
sort_keys | Sort dict keys alphabetically | True for diffable output |
separators | Override ", " and ": " | (",", ":") for smallest |
ensure_ascii | Escape non-ASCII as \uXXXX | False to keep UTF-8 literal |
allow_nan | Allow NaN/Infinity (non-standard) | False for strict JSON |
default | Callable for unsupported types | see next section |
cls | Custom JSONEncoder subclass | see "Custom encoders" |
Custom encoders — default= for unsupported types
Pass a default callable to dumps to handle types JSON doesn't know about (datetime, Path, Decimal, set, dataclasses, …). The callable receives the unencodable object and returns something JSON-serialisable; raise TypeError for anything you don't handle.
import json
from datetime import datetime, UTC
from pathlib import Path
from decimal import Decimal
from dataclasses import dataclass, asdict, is_dataclass
@dataclass
class User:
name: str
joined: datetime
def encode(obj):
if isinstance(obj, datetime):
return obj.isoformat()
if isinstance(obj, Path):
return str(obj)
if isinstance(obj, Decimal):
return str(obj)
if isinstance(obj, set):
return sorted(obj)
if is_dataclass(obj):
return asdict(obj)
raise TypeError(f"cannot encode {type(obj).__name__}")
data = {
"user": User("Alice Dev", datetime(2026, 5, 25, tzinfo=UTC)),
"config": Path("/home/alice/.config"),
"balance": Decimal("19.99"),
"tags": {"admin", "user"},
}
print(json.dumps(data, default=encode, indent=2))
Output:
{
"user": {
"name": "Alice Dev",
"joined": "2026-05-25T00:00:00+00:00"
},
"config": "/home/alice/.config",
"balance": "19.99",
"tags": ["admin", "user"]
}
Class-based encoders — cls=JSONEncoder
For project-wide reuse, subclass json.JSONEncoder and override default. The class form is interchangeable with the default= callable but composes better when you want to combine multiple type handlers.
import json
from datetime import date, datetime
from pathlib import Path
from decimal import Decimal
from uuid import UUID
class RichJSONEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, (datetime, date)):
return obj.isoformat()
if isinstance(obj, Path):
return str(obj)
if isinstance(obj, Decimal):
return float(obj) if obj % 1 else int(obj)
if isinstance(obj, UUID):
return str(obj)
if isinstance(obj, set):
return sorted(obj)
return super().default(obj) # raises TypeError
print(json.dumps({"d": date(2026, 5, 25), "p": Path("/tmp")},
cls=RichJSONEncoder))
Output:
{"d": "2026-05-25", "p": "/tmp"}
Decoding — object_hook and parse_* callbacks
object_hook is called with every decoded JSON object (Python dict), letting you transform {"__type__": "datetime", "value": "..."} markers back into real datetime instances. Pair it with a custom encoder to round-trip non-JSON types.
import json
from datetime import datetime
def encode(obj):
if isinstance(obj, datetime):
return {"__type__": "datetime", "value": obj.isoformat()}
raise TypeError(type(obj).__name__)
def decode_hook(d):
if d.get("__type__") == "datetime":
return datetime.fromisoformat(d["value"])
return d
s = json.dumps({"created": datetime(2026, 5, 25, 14, 30)}, default=encode)
print(s)
parsed = json.loads(s, object_hook=decode_hook)
print(parsed, type(parsed["created"]).__name__)
Output:
{"created": {"__type__": "datetime", "value": "2026-05-25T14:30:00"}}
{'created': datetime.datetime(2026, 5, 25, 14, 30)} datetime
loads also exposes parse_float, parse_int, and parse_constant for fine-grained control over numbers and NaN/Infinity. The common use is parse_float=Decimal to keep financial values exact:
import json
from decimal import Decimal
balance = json.loads('{"amount": 0.1}', parse_float=Decimal)
print(balance, type(balance["amount"]).__name__)
Output:
{'amount': Decimal('0.1')} Decimal
File I/O — dump and load
dump/load work on any file-like object. They are not faster than dumps/loads + reading the file — they exist for ergonomics. Use them when you have a single document; for streaming JSON Lines see the next section.
import json
from pathlib import Path
p = Path("user.json")
p.write_text("")
with p.open("w") as f:
json.dump({"name": "Alice Dev", "active": True}, f, indent=2)
with p.open() as f:
print(json.load(f))
Output:
{'name': 'Alice Dev', 'active': True}
JSON Lines (JSONL) — streaming records
JSON Lines (.jsonl, .ndjson) is a format where each line is a self-contained JSON object. It's the standard for log files, ML datasets, and append-only event streams because it can be read/written one record at a time without holding the whole file in memory.
import json
from pathlib import Path
records = [
{"id": 1, "name": "Alice Dev"},
{"id": 2, "name": "Bob Dev"},
{"id": 3, "name": "Carol Dev"},
]
p = Path("users.jsonl")
with p.open("w") as f:
for r in records:
f.write(json.dumps(r) + "\n")
# Read back, one record at a time
with p.open() as f:
for line in f:
record = json.loads(line)
print(record["id"], record["name"])
Output:
1 Alice Dev
2 Bob Dev
3 Carol Dev
A reusable helper makes this even tidier:
def write_jsonl(path, records, *, default=None):
with open(path, "w") as f:
for r in records:
f.write(json.dumps(r, default=default) + "\n")
def read_jsonl(path):
with open(path) as f:
for line in f:
line = line.strip()
if line:
yield json.loads(line)
json.tool — pretty-print from the CLI
The stdlib ships a CLI wrapper that pretty-prints JSON. It's a one-line replacement for jq . when you don't have jq installed.
echo '{"name":"Alice","tags":["a","b"]}' | python -m json.tool
python -m json.tool --indent 4 raw.json formatted.json
python -m json.tool --sort-keys raw.json
Output:
{
"name": "Alice",
"tags": [
"a",
"b"
]
}
Use --no-ensure-ascii to keep UTF-8 literal, --compact to strip whitespace, and pass two filenames for in-place reformatting.
Round-tripping a dataclass through JSON
Dataclasses don't serialise natively, but dataclasses.asdict + a custom default handles the common case. For nested dataclasses with datetime fields, the decoder needs to know what class to rebuild.
import json
from dataclasses import dataclass, asdict, field, fields, is_dataclass
from datetime import datetime, UTC
from typing import get_type_hints
@dataclass
class Post:
id: int
title: str
published: datetime
tags: list[str] = field(default_factory=list)
def encode(obj):
if is_dataclass(obj):
return asdict(obj)
if isinstance(obj, datetime):
return obj.isoformat()
raise TypeError(type(obj).__name__)
def to_dataclass(cls, data: dict):
"""Rebuild a dataclass instance, parsing datetime fields by annotation."""
hints = get_type_hints(cls)
kwargs = {}
for f in fields(cls):
v = data[f.name]
if hints[f.name] is datetime and isinstance(v, str):
v = datetime.fromisoformat(v)
kwargs[f.name] = v
return cls(**kwargs)
post = Post(1, "Hello", datetime(2026, 5, 25, tzinfo=UTC), tags=["intro"])
s = json.dumps(post, default=encode, indent=2)
print(s)
restored = to_dataclass(Post, json.loads(s))
print(restored)
print(restored == post)
Output:
{
"id": 1,
"title": "Hello",
"published": "2026-05-25T00:00:00+00:00",
"tags": [
"intro"
]
}
Post(id=1, title='Hello', published=datetime.datetime(2026, 5, 25, 0, 0, tzinfo=datetime.timezone.utc), tags=['intro'])
True
For complex nested structures, prefer Pydantic — model.model_dump_json() and Model.model_validate_json() handle every type automatically.
Comparison with orjson, ujson, msgspec
The stdlib json module is pure-Python and the slowest of the family — but it's also the only one in the standard library, supports default= callables, and has zero install. The others are drop-in replacements with different trade-offs.
| Library | Speed | API surface | Notable |
|---|---|---|---|
json (stdlib) | 1× | full default=, cls, object_hook | always available |
orjson | ~10–20× | dumps/loads only, returns bytes | best speed, handles datetime/UUID natively |
ujson | ~3–5× | matches stdlib closely | older, less feature-rich than orjson |
msgspec | ~10–30× | schema-aware (Struct, type-validated) | doubles as a Pydantic alternative |
# orjson — bytes in/out, no default needed for datetime/UUID
import orjson
from datetime import datetime, UTC
raw = orjson.dumps({"now": datetime.now(UTC)}) # → bytes
print(raw)
print(orjson.loads(raw))
Output:
b'{"now":"2026-05-25T14:30:00+00:00"}'
{'now': '2026-05-25T14:30:00+00:00'}
orjson.dumpsreturnsbytes, notstr. Use.decode()if you need a string, or pass the bytes directly toPath.write_bytes/socket.send.
Rule of thumb: use stdlib json for almost everything. Switch to orjson if you serialise multi-MB payloads in a hot path. Switch to msgspec if you also want validation and want to avoid Pydantic's overhead.
Common pitfalls
- Non-string keys are coerced —
json.dumps({1: "a"})produces{"1": "a"}silently. The reverse trip gives youstrkeys. Convert explicitly if it matters. NaNandInfinityare not valid JSON — but stdlib accepts them by default (allow_nan=True). Setallow_nan=Falseto enforce strict RFC 8259, or sanitise withmath.isfinite().TypeError: Object of type X is not JSON serializable— meansdefault=did not cover that type. Add a branch to your encoder.- Loss of
intprecision — JavaScript clients lose precision above2^53; if your consumer is JS, serialise large integers as strings. json.dumps(d, sort_keys=True)is not stable across dict types —OrderedDictignoressort_keys; with plain dicts the output is deterministic.ensure_ascii=Trueis the default — non-ASCII text becomes\uXXXX. Setensure_ascii=Falsefor human-readable output (and remember to write the file as UTF-8).load/dumpare not faster thanloads/dumps— they just save you the file-read step. Don't switch for performance reasons.object_hookruns on every nested dict — including dicts you don't want to transform. Use a__type__marker or check structure inside the hook.- JSONL: forget the trailing newline —
json.dumps(record)does not append\n. Add it yourself. json.loads(bytes)works (3.6+) — but only for UTF-8. Pass.decode("utf-16")first for other encodings.Decimalround-trip viafloatis lossy — useparse_float=Decimalon load andstr(d)in your encoder.pathlib.Pathis not serialisable — add it to your encoder; this trips up scripts that pass config dicts around.
Real-world recipes
Atomic write to disk
Writing JSON to a file directly risks leaving a corrupt half-written file if the process is killed. Write to a temp file and rename, which is atomic on POSIX and Windows.
import json
import os
from pathlib import Path
def write_json_atomic(path: Path, data, **kwargs):
tmp = path.with_suffix(path.suffix + ".tmp")
with tmp.open("w") as f:
json.dump(data, f, **kwargs)
f.flush()
os.fsync(f.fileno())
tmp.replace(path) # atomic rename
write_json_atomic(Path("config.json"),
{"host": "myhost", "port": 9000},
indent=2, sort_keys=True)
print(Path("config.json").read_text())
Output:
{
"host": "myhost",
"port": 9000
}
Streaming a large JSONL file with progress
Process a million-record JSONL file without holding it in memory. Use enumerate for progress and yield so callers can chain transformations.
import json
from pathlib import Path
def stream_jsonl(path: Path):
with path.open() as f:
for line_no, line in enumerate(f, 1):
line = line.strip()
if not line:
continue
try:
yield line_no, json.loads(line)
except json.JSONDecodeError as e:
print(f"line {line_no}: bad JSON — {e}")
# Build a sample file
Path("events.jsonl").write_text("\n".join([
'{"id": 1, "type": "click"}',
'{"id": 2, "type": "view"}',
'not-json',
'{"id": 3, "type": "click"}',
]))
clicks = 0
for n, record in stream_jsonl(Path("events.jsonl")):
if record.get("type") == "click":
clicks += 1
print(f"clicks: {clicks}")
Output:
line 3: bad JSON — Expecting value: line 1 column 1 (char 0)
clicks: 2
Filter a large JSON file with jq-style queries
For a few keys, manual indexing is fine; for ad-hoc exploration of a huge file, shell out to jq. Use json.tool as a fallback when jq is not installed.
# With jq (best)
jq '.users[] | select(.active) | .name' users.json
# Stdlib equivalent
python - <<'PY'
import json, pathlib
data = json.loads(pathlib.Path("users.json").read_text())
for u in data["users"]:
if u.get("active"):
print(u["name"])
PY
Output:
"Alice Dev"
"Carol Dev"
Merge multiple JSON config files with overrides
A pattern that mirrors argparse + env-var fallbacks: load a base config, layer environment-specific overrides on top. dict.update() handles flat merges; for nested merges, recurse.
import json
from pathlib import Path
def deep_merge(base: dict, override: dict) -> dict:
out = dict(base)
for k, v in override.items():
if isinstance(v, dict) and isinstance(out.get(k), dict):
out[k] = deep_merge(out[k], v)
else:
out[k] = v
return out
Path("base.json").write_text(json.dumps({
"host": "localhost", "port": 8080,
"logging": {"level": "INFO", "file": "/var/log/app.log"},
}))
Path("prod.json").write_text(json.dumps({
"host": "myhost",
"logging": {"level": "WARNING"},
}))
base = json.loads(Path("base.json").read_text())
override = json.loads(Path("prod.json").read_text())
print(json.dumps(deep_merge(base, override), indent=2))
Output:
{
"host": "myhost",
"port": 8080,
"logging": {
"level": "WARNING",
"file": "/var/log/app.log"
}
}
REST API client that auto-encodes datetimes and UUIDs
A small helper that adds custom encoding to every requests call, so you never serialise raw datetime again.
import json
from datetime import datetime, UTC
from uuid import UUID, uuid4
class APIEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, datetime):
return obj.isoformat()
if isinstance(obj, UUID):
return str(obj)
return super().default(obj)
def api_post(url: str, payload: dict) -> str:
body = json.dumps(payload, cls=APIEncoder)
# In real code:
# return requests.post(url, data=body,
# headers={"Content-Type": "application/json"}).text
return body
print(api_post("https://api.example.com/events", {
"id": UUID("12345678-1234-5678-1234-567812345678"),
"user": "alicedev",
"at": datetime(2026, 5, 25, 14, 30, tzinfo=UTC),
}))
Output:
{"id": "12345678-1234-5678-1234-567812345678", "user": "alicedev", "at": "2026-05-25T14:30:00+00:00"}
Validate a JSON payload's shape
Quick structural validation without pulling in Pydantic or jsonschema — useful for one-off scripts.
import json
def require(obj, *keys):
missing = [k for k in keys if k not in obj]
if missing:
raise ValueError(f"missing keys: {missing}")
payload = json.loads('{"name": "Alice Dev", "email": "alice@example.com"}')
try:
require(payload, "name", "email", "age")
except ValueError as e:
print(e)
Output:
missing keys: ['age']
For anything more complex than shape checks (types, ranges, regex), use Pydantic: User.model_validate_json(raw_bytes) does parse + validation in a single call and gives detailed errors.