cheat sheet
dataclasses
Define typed data containers with @dataclass — frozen, slots, kw_only, default_factory, __post_init__, asdict, replace, and how it compares to attrs, pydantic, NamedTuple, TypedDict.
dataclasses — Boilerplate-Free Classes
What it is
dataclasses is the standard library module added in Python 3.7 (PEP 557) that auto-generates __init__, __repr__, __eq__, and optionally __hash__ / __lt__ from class-level type annotations. Reach for it when you want a typed record (a config, a row, a DTO, a message) without writing constructor boilerplate, and you don't need runtime validation. For runtime validation against types use pydantic; for finer control over the generated methods use attrs. dataclasses lives in between — zero dependencies, pure Python, no coercion.
Install
dataclasses is part of the Python standard library (3.7+) and requires no installation. Verify it loads:
python -c "from dataclasses import dataclass; print(dataclass)"
Output:
<function dataclass at 0x7f9c1a2b3e60>
Quick example
The @dataclass decorator inspects the class body's annotated attributes and synthesizes the dunder methods you'd otherwise write by hand. The annotations are not enforced at runtime — they're hints for tools (mypy, IDEs) and metadata for dataclasses.fields().
from dataclasses import dataclass
@dataclass
class User:
name: str
email: str
age: int = 0
active: bool = True
u = User(name="Alice Dev", email="alice@example.com", age=30)
print(u)
print(u == User("Alice Dev", "alice@example.com", 30))
Output:
User(name='Alice Dev', email='alice@example.com', age=30, active=True)
True
What the decorator generates
By default @dataclass synthesizes __init__, __repr__, and __eq__. Each can be turned off via decorator arguments; additional methods are opt-in. The full set of toggles is:
| Argument | Default | Effect |
|---|---|---|
init=True | on | Generate __init__ |
repr=True | on | Generate __repr__ |
eq=True | on | Generate __eq__ |
order=False | off | Generate __lt__, __le__, __gt__, __ge__ |
unsafe_hash=False | off | Force __hash__ even when not safe |
frozen=False | off | Make instances immutable (no attribute assignment) |
match_args=True | on | Generate __match_args__ for structural pattern matching (3.10+) |
kw_only=False | off | All fields are keyword-only (3.10+) |
slots=False | off | Generate __slots__ (3.10+) |
weakref_slot=False | off | Add a __weakref__ slot when slots=True (3.11+) |
from dataclasses import dataclass
@dataclass(order=True, frozen=True, slots=True)
class Point:
x: int
y: int
a, b = Point(1, 2), Point(3, 4)
print(sorted([b, a]))
Output:
[Point(x=1, y=2), Point(x=3, y=4)]
field() — per-field configuration
When the class default isn't enough — mutable defaults, init exclusion, hash exclusion, metadata — wrap the default in field(...). It's the per-attribute equivalent of the decorator arguments.
field() arg | Effect |
|---|---|
default | Default value |
default_factory | Zero-arg callable for the default (lists, dicts, sets, dataclasses) |
init | If False, exclude from __init__ |
repr | If False, exclude from __repr__ |
compare | If False, exclude from __eq__ and __lt__ |
hash | If False, exclude from __hash__ |
metadata | Arbitrary mapping kept on the Field object (use for docs, validators, serializers) |
kw_only | Per-field keyword-only flag (3.10+) |
from dataclasses import dataclass, field
@dataclass
class Cart:
user_id: int
items: list[str] = field(default_factory=list)
discount: float = 0.0
_cache: dict = field(default_factory=dict, repr=False, compare=False)
c = Cart(user_id=42)
c.items.append("book")
print(c)
Output:
Cart(user_id=42, items=['book'], discount=0.0)
Mutable defaults raise. Writing
items: list = []raisesValueError: mutable default <class 'list'> for field items is not allowed: use default_factory. The error catches the classic shared-state bug at class-creation time.
__post_init__ — the validation hook
__post_init__ runs automatically at the end of the generated __init__, after all fields are assigned. Use it for validation, normalization, or computing derived fields. Combine with field(init=False) to populate an attribute the user never passes.
from dataclasses import dataclass, field
from datetime import datetime
@dataclass
class Order:
id: int
items: list[str]
total: float
created_at: datetime = field(init=False)
def __post_init__(self) -> None:
if self.total < 0:
raise ValueError("total cannot be negative")
self.created_at = datetime.now()
o = Order(id=1, items=["a", "b"], total=19.99)
print(o.created_at.year, o.total)
Output:
2026 19.99
InitVar — pseudo-fields visible only to __post_init__
InitVar[T] is an annotation that creates a constructor parameter without storing it as an attribute. The value is forwarded to __post_init__ and then discarded. Useful for derivation inputs (e.g. raw password -> hashed) that should never persist on the instance.
from dataclasses import dataclass, field, InitVar
import hashlib
@dataclass
class User:
username: str
password_hash: str = field(init=False)
password: InitVar[str] = ""
def __post_init__(self, password: str) -> None:
self.password_hash = hashlib.sha256(password.encode()).hexdigest()
u = User(username="alicedev", password="hunter2")
print(u) # no `password` attr
print(hasattr(u, "password")) # False
Output:
User(username='alicedev', password_hash='f52fbd32b2b3b86ff88ef6c490628285f482af15ddcb29541f94bcf526a3f6c7')
False
frozen=True — immutable instances
frozen=True makes attribute assignment raise FrozenInstanceError. Frozen dataclasses are hashable by default (so they can live in sets and dict keys) — exactly what you want for config objects, message envelopes, and value types.
from dataclasses import dataclass, FrozenInstanceError
@dataclass(frozen=True)
class AppConfig:
debug: bool
host: str = "127.0.0.1"
port: int = 8000
cfg = AppConfig(debug=False)
try:
cfg.port = 9000
except FrozenInstanceError as e:
print("immutable:", e)
# Hashable -> usable as dict key
configs = {cfg: "default"}
print(configs)
Output:
immutable: cannot assign to field 'port'
{AppConfig(debug=False, host='127.0.0.1', port=8000): 'default'}
frozen=Trueis shallow: the references are frozen, the objects they point to are not.cfg.items.append(...)still works ifitemsis a list. Combine withtuple/frozensetfor full immutability.
slots=True — memory + speed win (3.10+)
slots=True generates __slots__, removing the per-instance __dict__. The benefits: ~50% less memory per instance, ~20% faster attribute access, and a guard against typo-attribute-creation (u.naem = "..." raises AttributeError).
from dataclasses import dataclass
import sys
@dataclass
class WithoutSlots:
x: int
y: int
@dataclass(slots=True)
class WithSlots:
x: int
y: int
a, b = WithoutSlots(1, 2), WithSlots(1, 2)
print("__dict__ ->", sys.getsizeof(a.__dict__))
try:
print(b.__dict__)
except AttributeError as e:
print("slotted has no __dict__:", e)
try:
b.z = 3
except AttributeError as e:
print("typo guard:", e)
Output:
__dict__ -> 296
slotted has no __dict__: 'WithSlots' object has no attribute '__dict__'
typo guard: 'WithSlots' object has no attribute 'z'
slots=Truecreates a brand-new class, not the one decorated. Decorators applied below@dataclass(slots=True)see the slotted class; references to the original (e.g.weakref, base classes referenced before decoration) won't.
kw_only=True — force keyword-only arguments (3.10+)
kw_only=True (decorator-level) makes every field keyword-only in __init__. At the field level, kw_only=True marks individual fields as keyword-only — handy for adding a default-bearing field after non-default fields in a subclass.
from dataclasses import dataclass, field
@dataclass(kw_only=True)
class Request:
method: str
path: str
body: bytes = b""
timeout: float = 30.0
# Positional args now raise
try:
Request("GET", "/")
except TypeError as e:
print(e)
r = Request(method="GET", path="/")
print(r)
Output:
Request.__init__() takes 1 positional argument but 3 were given
Request(method='GET', path='/', body=b'', timeout=30.0)
The per-field form solves a common inheritance headache:
from dataclasses import dataclass, field
@dataclass
class Base:
name: str
@dataclass
class Child(Base):
note: str = field(kw_only=True) # allowed even though Base has no default
enabled: bool = True
Output: (none — type defines correctly)
order=True — comparison operators
order=True synthesizes <, <=, >, >= based on tuple comparison of the fields in declaration order. Combined with frozen=True, dataclasses become hashable, orderable value types — drop-in replacements for tuple with named fields.
from dataclasses import dataclass
@dataclass(order=True, frozen=True)
class Version:
major: int
minor: int
patch: int
v1 = Version(1, 9, 0)
v2 = Version(1, 10, 0)
print(v1 < v2)
print(sorted([v2, v1, Version(2, 0, 0)]))
Output:
True
[Version(major=1, minor=9, patch=0), Version(major=1, minor=10, patch=0), Version(major=2, minor=0, patch=0)]
asdict / astuple — convert to plain containers
dataclasses.asdict(obj) deeply converts a dataclass (and any nested dataclasses, lists, dicts, tuples it contains) into plain dicts. astuple does the same to tuples. Use asdict for JSON serialization (after a default=str cleanup pass for non-JSON-native types).
from dataclasses import dataclass, field, asdict, astuple
import json
@dataclass
class Address:
street: str
city: str
@dataclass
class User:
name: str
address: Address
tags: list[str] = field(default_factory=list)
u = User("Alice Dev", Address("1 Main St", "Springfield"), ["admin"])
print(asdict(u))
print(astuple(u))
print(json.dumps(asdict(u)))
Output:
{'name': 'Alice Dev', 'address': {'street': '1 Main St', 'city': 'Springfield'}, 'tags': ['admin']}
('Alice Dev', ('1 Main St', 'Springfield'), ['admin'])
{"name": "Alice Dev", "address": {"street": "1 Main St", "city": "Springfield"}, "tags": ["admin"]}
replace — copy-with-overrides
dataclasses.replace(obj, **changes) creates a new instance with the listed fields overridden — the equivalent of namedtuple._replace. Indispensable for frozen dataclasses where you can't mutate in place.
from dataclasses import dataclass, replace
@dataclass(frozen=True)
class Window:
title: str
width: int = 800
height: int = 600
w = Window("editor")
w2 = replace(w, title="editor (modified)", width=1024)
print(w)
print(w2)
Output:
Window(title='editor', width=800, height=600)
Window(title='editor (modified)', width=1024, height=600)
fields() — introspection
dataclasses.fields(cls) returns a tuple of Field objects describing every declared field — name, type, default, metadata. Useful for writing serializers, form builders, or CLI generators that walk a dataclass at runtime.
from dataclasses import dataclass, field, fields
@dataclass
class Setting:
key: str
value: str = field(metadata={"help": "the value to set"})
sensitive: bool = field(default=False, metadata={"help": "redact in logs"})
for f in fields(Setting):
print(f"{f.name:<10} {f.type!s:<10} default={f.default!r:<10} {f.metadata}")
Output:
key <class 'str'> default=<dataclasses._MISSING_TYPE object at 0x7f9c...> {}
value <class 'str'> default=<dataclasses._MISSING_TYPE object at 0x7f9c...> {'help': 'the value to set'}
sensitive <class 'bool'> default=False {'help': 'redact in logs'}
Inheritance
A subclass @dataclass can add fields and override defaults, but all fields with defaults must come after fields without defaults across the merged MRO — otherwise you get TypeError: non-default argument follows default argument. Use kw_only=True to escape this constraint cleanly.
from dataclasses import dataclass, field
@dataclass
class Animal:
name: str
species: str
@dataclass(kw_only=True)
class Pet(Animal):
owner: str
nickname: str = ""
p = Pet(name="Whiskers", species="cat", owner="alicedev", nickname="Mr. W")
print(p)
Output:
Pet(name='Whiskers', species='cat', owner='alicedev', nickname='Mr. W')
Structural pattern matching (3.10+)
@dataclass generates __match_args__ from the positional __init__ parameters, enabling match/case patterns by class and field. This is the cleanest way to dispatch on a sum-type style hierarchy.
from dataclasses import dataclass
@dataclass
class Move:
dx: int
dy: int
@dataclass
class Quit:
reason: str
def step(event):
match event:
case Move(dx=0, dy=dy):
return f"vertical {dy}"
case Move(dx, dy):
return f"step {dx},{dy}"
case Quit(reason):
return f"quit: {reason}"
print(step(Move(0, 5)))
print(step(Move(3, -2)))
print(step(Quit("user-asked")))
Output:
vertical 5
step 3,-2
quit: user-asked
dataclasses vs alternatives
The Python ecosystem now offers half a dozen ways to define a record type. The right choice depends on whether you want runtime validation, immutability, performance, or zero dependencies.
| Tool | Validates at runtime | Frozen by default | Mutable defaults | Subclass-able | Notes |
|---|---|---|---|---|---|
dataclasses (stdlib) | No | No (opt-in) | default_factory | Yes | Zero deps, minimal magic |
pydantic v2 | Yes | No (opt-in) | Validators | Yes | Coerces; great for I/O |
attrs | Optional via validator= | Opt-in | Built-in slots | Yes | Predecessor; richer config |
msgspec.Struct | Yes (fast) | Opt-in | Yes | Yes | Fastest serializer |
typing.NamedTuple | No | Yes (tuple) | No | Limited | Indexable; immutable tuple |
typing.TypedDict | No (mypy only) | No (it's a dict) | n/a | Yes (multiple inheritance) | Annotation for dict shape |
collections.namedtuple | No | Yes | No | Limited | Pre-PEP-526; avoid for new code |
from dataclasses import dataclass
from typing import NamedTuple, TypedDict
@dataclass
class UserDC:
name: str
age: int = 0
class UserNT(NamedTuple):
name: str
age: int = 0
class UserTD(TypedDict):
name: str
age: int
dc = UserDC("Alice", 30)
nt = UserNT("Alice", 30)
td: UserTD = {"name": "Alice", "age": 30}
print(dc.name, nt[0], td["name"])
print(type(dc).__name__, type(nt).__name__, type(td).__name__)
Output:
Alice Alice Alice
UserDC UserNT dict
When to pick which
dataclasses— internal records, configs, CLI option objects, message envelopes. No external data crossing the boundary, no need to validate.pydantic— anything parsed from JSON/YAML/env vars/HTTP. Use it when bad input is a runtime concern.msgspec.Struct— high-volume serialization (event streams, MQ payloads). Beats both above on throughput.NamedTuple— when you also want tuple unpacking and indexing semantics.TypedDict— when the wire format is already a dict and you want type hints without converting.
Common pitfalls
- Bare mutable defaults are an error —
items: list = []raises at class creation. Usefield(default_factory=list). - Field ordering with defaults + inheritance — a default-less field in a subclass after a default-bearing field in the parent raises
TypeError. Usekw_only=True. frozen=Trueis shallow — references to mutable objects can still be mutated through. Combine with immutable types (tuple, frozenset, frozen dataclass) for transitively-immutable values.__eq__withfrozen=Truestill requires same class — two instances of two different frozen dataclasses with identical fields are not equal. Compareasdict()or use a single class.__init__is auto-generated and silently overrides yours — defining__init__in a@dataclassbody shadows the synth and breaksdefault_factory. Use__post_init__instead.asdictis recursive and deep-copies — for large nested structures this is expensive. Usedataclasses.fields()+ manual conversion for hot paths.- Type annotations are not enforced —
User(name=42, age="thirty")succeeds. Usepydanticor write a__post_init__validator if you need runtime checks. ClassVarandInitVarare special —ClassVar[T]skips field generation entirely;InitVar[T]creates an__init__arg but no attribute. Forgetting either makes things appear or disappear unexpectedly.slots=Truereturns a new class —OrigCls is not OrigClsafter decoration. Code that relies on identity (caches keyed by class object) breaks.
Real-world recipes
Config object loaded from env vars (frozen + slotted)
A common deployment pattern: load every setting from environment variables once at startup, freeze the result so no code can accidentally mutate it, and slot it for compactness.
from dataclasses import dataclass, field
import os
def _bool_env(key: str, default: bool = False) -> bool:
return os.environ.get(key, str(default)).lower() in {"1", "true", "yes", "on"}
@dataclass(frozen=True, slots=True, kw_only=True)
class AppConfig:
database_url: str
redis_url: str = "redis://localhost:6379/0"
debug: bool = False
log_level: str = "INFO"
workers: int = 4
allowed_hosts: tuple[str, ...] = ()
@classmethod
def from_env(cls) -> "AppConfig":
hosts = os.environ.get("ALLOWED_HOSTS", "")
return cls(
database_url=os.environ["DATABASE_URL"],
redis_url=os.environ.get("REDIS_URL", "redis://localhost:6379/0"),
debug=_bool_env("DEBUG"),
log_level=os.environ.get("LOG_LEVEL", "INFO"),
workers=int(os.environ.get("WORKERS", "4")),
allowed_hosts=tuple(h.strip() for h in hosts.split(",") if h.strip()),
)
os.environ["DATABASE_URL"] = "postgres://alicedev@myhost.local/app"
cfg = AppConfig.from_env()
print(cfg)
Output:
AppConfig(database_url='postgres://alicedev@myhost.local/app', redis_url='redis://localhost:6379/0', debug=False, log_level='INFO', workers=4, allowed_hosts=())
Round-trip through JSON
A dataclass round-trips cleanly through JSON if every field is a JSON-native type or has a known converter. Build from_dict and to_json helpers and you have a zero-dependency serializer.
from dataclasses import dataclass, field, asdict
from datetime import datetime
import json
@dataclass
class Event:
id: int
name: str
created_at: datetime = field(default_factory=datetime.now)
tags: list[str] = field(default_factory=list)
def to_json(self) -> str:
return json.dumps(asdict(self), default=str)
@classmethod
def from_dict(cls, d: dict) -> "Event":
return cls(
id=d["id"],
name=d["name"],
created_at=datetime.fromisoformat(d["created_at"]),
tags=list(d.get("tags", [])),
)
e = Event(id=1, name="signup", tags=["alpha"])
blob = e.to_json()
print(blob)
print(Event.from_dict(json.loads(blob)))
Output:
{"id": 1, "name": "signup", "created_at": "2026-05-25 15:01:11.842371", "tags": ["alpha"]}
Event(id=1, name='signup', created_at=datetime.datetime(2026, 5, 25, 15, 1, 11, 842371), tags=['alpha'])
Pattern-matched message dispatch
A frozen dataclass per message variant + match is the idiomatic way to write a state machine, parser, or reducer. The code reads top-down like a spec.
from dataclasses import dataclass
@dataclass(frozen=True)
class Connect:
host: str
port: int
@dataclass(frozen=True)
class Send:
payload: bytes
@dataclass(frozen=True)
class Disconnect:
reason: str
def handle(msg):
match msg:
case Connect(host=h, port=p):
return f"opening connection to {h}:{p}"
case Send(payload=b) if len(b) > 1024:
return f"chunking {len(b)} bytes"
case Send(payload=b):
return f"sending {len(b)} bytes"
case Disconnect(reason=r):
return f"closed: {r}"
print(handle(Connect("myhost.local", 5432)))
print(handle(Send(b"x" * 2048)))
print(handle(Disconnect("timeout")))
Output:
opening connection to myhost.local:5432
chunking 2048 bytes
closed: timeout
Diff two configs
Walking fields() of a dataclass produces a compact, reusable diff for any pair of same-typed instances. Useful for "what changed?" log lines on reload.
from dataclasses import dataclass, fields
@dataclass(frozen=True)
class Config:
debug: bool = False
workers: int = 4
host: str = "127.0.0.1"
def diff(a, b) -> dict:
return {
f.name: (getattr(a, f.name), getattr(b, f.name))
for f in fields(a)
if getattr(a, f.name) != getattr(b, f.name)
}
old = Config()
new = Config(debug=True, workers=8)
print(diff(old, new))
Output:
{'debug': (False, True), 'workers': (4, 8)}