cheat sheet

dataclasses

Define typed data containers with @dataclass — frozen, slots, kw_only, default_factory, __post_init__, asdict, replace, and how it compares to attrs, pydantic, NamedTuple, TypedDict.

#python#stdlib#typingupdated 05-25-2026

dataclasses — Boilerplate-Free Classes

What it is

dataclasses is the standard library module added in Python 3.7 (PEP 557) that auto-generates __init__, __repr__, __eq__, and optionally __hash__ / __lt__ from class-level type annotations. Reach for it when you want a typed record (a config, a row, a DTO, a message) without writing constructor boilerplate, and you don't need runtime validation. For runtime validation against types use pydantic; for finer control over the generated methods use attrs. dataclasses lives in between — zero dependencies, pure Python, no coercion.

Install

dataclasses is part of the Python standard library (3.7+) and requires no installation. Verify it loads:

bash
python -c "from dataclasses import dataclass; print(dataclass)"

Output:

text
<function dataclass at 0x7f9c1a2b3e60>

Quick example

The @dataclass decorator inspects the class body's annotated attributes and synthesizes the dunder methods you'd otherwise write by hand. The annotations are not enforced at runtime — they're hints for tools (mypy, IDEs) and metadata for dataclasses.fields().

python
from dataclasses import dataclass

@dataclass
class User:
    name: str
    email: str
    age: int = 0
    active: bool = True

u = User(name="Alice Dev", email="alice@example.com", age=30)
print(u)
print(u == User("Alice Dev", "alice@example.com", 30))

Output:

text
User(name='Alice Dev', email='alice@example.com', age=30, active=True)
True

What the decorator generates

By default @dataclass synthesizes __init__, __repr__, and __eq__. Each can be turned off via decorator arguments; additional methods are opt-in. The full set of toggles is:

ArgumentDefaultEffect
init=TrueonGenerate __init__
repr=TrueonGenerate __repr__
eq=TrueonGenerate __eq__
order=FalseoffGenerate __lt__, __le__, __gt__, __ge__
unsafe_hash=FalseoffForce __hash__ even when not safe
frozen=FalseoffMake instances immutable (no attribute assignment)
match_args=TrueonGenerate __match_args__ for structural pattern matching (3.10+)
kw_only=FalseoffAll fields are keyword-only (3.10+)
slots=FalseoffGenerate __slots__ (3.10+)
weakref_slot=FalseoffAdd a __weakref__ slot when slots=True (3.11+)
python
from dataclasses import dataclass

@dataclass(order=True, frozen=True, slots=True)
class Point:
    x: int
    y: int

a, b = Point(1, 2), Point(3, 4)
print(sorted([b, a]))

Output:

text
[Point(x=1, y=2), Point(x=3, y=4)]

field() — per-field configuration

When the class default isn't enough — mutable defaults, init exclusion, hash exclusion, metadata — wrap the default in field(...). It's the per-attribute equivalent of the decorator arguments.

field() argEffect
defaultDefault value
default_factoryZero-arg callable for the default (lists, dicts, sets, dataclasses)
initIf False, exclude from __init__
reprIf False, exclude from __repr__
compareIf False, exclude from __eq__ and __lt__
hashIf False, exclude from __hash__
metadataArbitrary mapping kept on the Field object (use for docs, validators, serializers)
kw_onlyPer-field keyword-only flag (3.10+)
python
from dataclasses import dataclass, field

@dataclass
class Cart:
    user_id: int
    items: list[str] = field(default_factory=list)
    discount: float = 0.0
    _cache: dict = field(default_factory=dict, repr=False, compare=False)

c = Cart(user_id=42)
c.items.append("book")
print(c)

Output:

text
Cart(user_id=42, items=['book'], discount=0.0)

Mutable defaults raise. Writing items: list = [] raises ValueError: mutable default <class 'list'> for field items is not allowed: use default_factory. The error catches the classic shared-state bug at class-creation time.

__post_init__ — the validation hook

__post_init__ runs automatically at the end of the generated __init__, after all fields are assigned. Use it for validation, normalization, or computing derived fields. Combine with field(init=False) to populate an attribute the user never passes.

python
from dataclasses import dataclass, field
from datetime import datetime

@dataclass
class Order:
    id: int
    items: list[str]
    total: float
    created_at: datetime = field(init=False)

    def __post_init__(self) -> None:
        if self.total < 0:
            raise ValueError("total cannot be negative")
        self.created_at = datetime.now()

o = Order(id=1, items=["a", "b"], total=19.99)
print(o.created_at.year, o.total)

Output:

text
2026 19.99

InitVar — pseudo-fields visible only to __post_init__

InitVar[T] is an annotation that creates a constructor parameter without storing it as an attribute. The value is forwarded to __post_init__ and then discarded. Useful for derivation inputs (e.g. raw password -> hashed) that should never persist on the instance.

python
from dataclasses import dataclass, field, InitVar
import hashlib

@dataclass
class User:
    username: str
    password_hash: str = field(init=False)
    password: InitVar[str] = ""

    def __post_init__(self, password: str) -> None:
        self.password_hash = hashlib.sha256(password.encode()).hexdigest()

u = User(username="alicedev", password="hunter2")
print(u)                                # no `password` attr
print(hasattr(u, "password"))           # False

Output:

text
User(username='alicedev', password_hash='f52fbd32b2b3b86ff88ef6c490628285f482af15ddcb29541f94bcf526a3f6c7')
False

frozen=True — immutable instances

frozen=True makes attribute assignment raise FrozenInstanceError. Frozen dataclasses are hashable by default (so they can live in sets and dict keys) — exactly what you want for config objects, message envelopes, and value types.

python
from dataclasses import dataclass, FrozenInstanceError

@dataclass(frozen=True)
class AppConfig:
    debug: bool
    host: str = "127.0.0.1"
    port: int = 8000

cfg = AppConfig(debug=False)
try:
    cfg.port = 9000
except FrozenInstanceError as e:
    print("immutable:", e)

# Hashable -> usable as dict key
configs = {cfg: "default"}
print(configs)

Output:

text
immutable: cannot assign to field 'port'
{AppConfig(debug=False, host='127.0.0.1', port=8000): 'default'}

frozen=True is shallow: the references are frozen, the objects they point to are not. cfg.items.append(...) still works if items is a list. Combine with tuple / frozenset for full immutability.

slots=True — memory + speed win (3.10+)

slots=True generates __slots__, removing the per-instance __dict__. The benefits: ~50% less memory per instance, ~20% faster attribute access, and a guard against typo-attribute-creation (u.naem = "..." raises AttributeError).

python
from dataclasses import dataclass
import sys

@dataclass
class WithoutSlots:
    x: int
    y: int

@dataclass(slots=True)
class WithSlots:
    x: int
    y: int

a, b = WithoutSlots(1, 2), WithSlots(1, 2)
print("__dict__ ->", sys.getsizeof(a.__dict__))
try:
    print(b.__dict__)
except AttributeError as e:
    print("slotted has no __dict__:", e)

try:
    b.z = 3
except AttributeError as e:
    print("typo guard:", e)

Output:

text
__dict__ -> 296
slotted has no __dict__: 'WithSlots' object has no attribute '__dict__'
typo guard: 'WithSlots' object has no attribute 'z'

slots=True creates a brand-new class, not the one decorated. Decorators applied below @dataclass(slots=True) see the slotted class; references to the original (e.g. weakref, base classes referenced before decoration) won't.

kw_only=True — force keyword-only arguments (3.10+)

kw_only=True (decorator-level) makes every field keyword-only in __init__. At the field level, kw_only=True marks individual fields as keyword-only — handy for adding a default-bearing field after non-default fields in a subclass.

python
from dataclasses import dataclass, field

@dataclass(kw_only=True)
class Request:
    method: str
    path: str
    body: bytes = b""
    timeout: float = 30.0

# Positional args now raise
try:
    Request("GET", "/")
except TypeError as e:
    print(e)

r = Request(method="GET", path="/")
print(r)

Output:

text
Request.__init__() takes 1 positional argument but 3 were given
Request(method='GET', path='/', body=b'', timeout=30.0)

The per-field form solves a common inheritance headache:

python
from dataclasses import dataclass, field

@dataclass
class Base:
    name: str

@dataclass
class Child(Base):
    note: str = field(kw_only=True)   # allowed even though Base has no default
    enabled: bool = True

Output: (none — type defines correctly)

order=True — comparison operators

order=True synthesizes <, <=, >, >= based on tuple comparison of the fields in declaration order. Combined with frozen=True, dataclasses become hashable, orderable value types — drop-in replacements for tuple with named fields.

python
from dataclasses import dataclass

@dataclass(order=True, frozen=True)
class Version:
    major: int
    minor: int
    patch: int

v1 = Version(1, 9, 0)
v2 = Version(1, 10, 0)
print(v1 < v2)
print(sorted([v2, v1, Version(2, 0, 0)]))

Output:

text
True
[Version(major=1, minor=9, patch=0), Version(major=1, minor=10, patch=0), Version(major=2, minor=0, patch=0)]

asdict / astuple — convert to plain containers

dataclasses.asdict(obj) deeply converts a dataclass (and any nested dataclasses, lists, dicts, tuples it contains) into plain dicts. astuple does the same to tuples. Use asdict for JSON serialization (after a default=str cleanup pass for non-JSON-native types).

python
from dataclasses import dataclass, field, asdict, astuple
import json

@dataclass
class Address:
    street: str
    city: str

@dataclass
class User:
    name: str
    address: Address
    tags: list[str] = field(default_factory=list)

u = User("Alice Dev", Address("1 Main St", "Springfield"), ["admin"])
print(asdict(u))
print(astuple(u))
print(json.dumps(asdict(u)))

Output:

text
{'name': 'Alice Dev', 'address': {'street': '1 Main St', 'city': 'Springfield'}, 'tags': ['admin']}
('Alice Dev', ('1 Main St', 'Springfield'), ['admin'])
{"name": "Alice Dev", "address": {"street": "1 Main St", "city": "Springfield"}, "tags": ["admin"]}

replace — copy-with-overrides

dataclasses.replace(obj, **changes) creates a new instance with the listed fields overridden — the equivalent of namedtuple._replace. Indispensable for frozen dataclasses where you can't mutate in place.

python
from dataclasses import dataclass, replace

@dataclass(frozen=True)
class Window:
    title: str
    width: int = 800
    height: int = 600

w = Window("editor")
w2 = replace(w, title="editor (modified)", width=1024)
print(w)
print(w2)

Output:

text
Window(title='editor', width=800, height=600)
Window(title='editor (modified)', width=1024, height=600)

fields() — introspection

dataclasses.fields(cls) returns a tuple of Field objects describing every declared field — name, type, default, metadata. Useful for writing serializers, form builders, or CLI generators that walk a dataclass at runtime.

python
from dataclasses import dataclass, field, fields

@dataclass
class Setting:
    key: str
    value: str = field(metadata={"help": "the value to set"})
    sensitive: bool = field(default=False, metadata={"help": "redact in logs"})

for f in fields(Setting):
    print(f"{f.name:<10} {f.type!s:<10} default={f.default!r:<10} {f.metadata}")

Output:

text
key        <class 'str'> default=<dataclasses._MISSING_TYPE object at 0x7f9c...> {}
value      <class 'str'> default=<dataclasses._MISSING_TYPE object at 0x7f9c...> {'help': 'the value to set'}
sensitive  <class 'bool'> default=False    {'help': 'redact in logs'}

Inheritance

A subclass @dataclass can add fields and override defaults, but all fields with defaults must come after fields without defaults across the merged MRO — otherwise you get TypeError: non-default argument follows default argument. Use kw_only=True to escape this constraint cleanly.

python
from dataclasses import dataclass, field

@dataclass
class Animal:
    name: str
    species: str

@dataclass(kw_only=True)
class Pet(Animal):
    owner: str
    nickname: str = ""

p = Pet(name="Whiskers", species="cat", owner="alicedev", nickname="Mr. W")
print(p)

Output:

text
Pet(name='Whiskers', species='cat', owner='alicedev', nickname='Mr. W')

Structural pattern matching (3.10+)

@dataclass generates __match_args__ from the positional __init__ parameters, enabling match/case patterns by class and field. This is the cleanest way to dispatch on a sum-type style hierarchy.

python
from dataclasses import dataclass

@dataclass
class Move:
    dx: int
    dy: int

@dataclass
class Quit:
    reason: str

def step(event):
    match event:
        case Move(dx=0, dy=dy):
            return f"vertical {dy}"
        case Move(dx, dy):
            return f"step {dx},{dy}"
        case Quit(reason):
            return f"quit: {reason}"

print(step(Move(0, 5)))
print(step(Move(3, -2)))
print(step(Quit("user-asked")))

Output:

text
vertical 5
step 3,-2
quit: user-asked

dataclasses vs alternatives

The Python ecosystem now offers half a dozen ways to define a record type. The right choice depends on whether you want runtime validation, immutability, performance, or zero dependencies.

ToolValidates at runtimeFrozen by defaultMutable defaultsSubclass-ableNotes
dataclasses (stdlib)NoNo (opt-in)default_factoryYesZero deps, minimal magic
pydantic v2YesNo (opt-in)ValidatorsYesCoerces; great for I/O
attrsOptional via validator=Opt-inBuilt-in slotsYesPredecessor; richer config
msgspec.StructYes (fast)Opt-inYesYesFastest serializer
typing.NamedTupleNoYes (tuple)NoLimitedIndexable; immutable tuple
typing.TypedDictNo (mypy only)No (it's a dict)n/aYes (multiple inheritance)Annotation for dict shape
collections.namedtupleNoYesNoLimitedPre-PEP-526; avoid for new code
python
from dataclasses import dataclass
from typing import NamedTuple, TypedDict

@dataclass
class UserDC:
    name: str
    age: int = 0

class UserNT(NamedTuple):
    name: str
    age: int = 0

class UserTD(TypedDict):
    name: str
    age: int

dc = UserDC("Alice", 30)
nt = UserNT("Alice", 30)
td: UserTD = {"name": "Alice", "age": 30}

print(dc.name, nt[0], td["name"])
print(type(dc).__name__, type(nt).__name__, type(td).__name__)

Output:

text
Alice Alice Alice
UserDC UserNT dict

When to pick which

  • dataclasses — internal records, configs, CLI option objects, message envelopes. No external data crossing the boundary, no need to validate.
  • pydantic — anything parsed from JSON/YAML/env vars/HTTP. Use it when bad input is a runtime concern.
  • msgspec.Struct — high-volume serialization (event streams, MQ payloads). Beats both above on throughput.
  • NamedTuple — when you also want tuple unpacking and indexing semantics.
  • TypedDict — when the wire format is already a dict and you want type hints without converting.

Common pitfalls

  1. Bare mutable defaults are an erroritems: list = [] raises at class creation. Use field(default_factory=list).
  2. Field ordering with defaults + inheritance — a default-less field in a subclass after a default-bearing field in the parent raises TypeError. Use kw_only=True.
  3. frozen=True is shallow — references to mutable objects can still be mutated through. Combine with immutable types (tuple, frozenset, frozen dataclass) for transitively-immutable values.
  4. __eq__ with frozen=True still requires same class — two instances of two different frozen dataclasses with identical fields are not equal. Compare asdict() or use a single class.
  5. __init__ is auto-generated and silently overrides yours — defining __init__ in a @dataclass body shadows the synth and breaks default_factory. Use __post_init__ instead.
  6. asdict is recursive and deep-copies — for large nested structures this is expensive. Use dataclasses.fields() + manual conversion for hot paths.
  7. Type annotations are not enforcedUser(name=42, age="thirty") succeeds. Use pydantic or write a __post_init__ validator if you need runtime checks.
  8. ClassVar and InitVar are specialClassVar[T] skips field generation entirely; InitVar[T] creates an __init__ arg but no attribute. Forgetting either makes things appear or disappear unexpectedly.
  9. slots=True returns a new classOrigCls is not OrigCls after decoration. Code that relies on identity (caches keyed by class object) breaks.

Real-world recipes

Config object loaded from env vars (frozen + slotted)

A common deployment pattern: load every setting from environment variables once at startup, freeze the result so no code can accidentally mutate it, and slot it for compactness.

python
from dataclasses import dataclass, field
import os

def _bool_env(key: str, default: bool = False) -> bool:
    return os.environ.get(key, str(default)).lower() in {"1", "true", "yes", "on"}

@dataclass(frozen=True, slots=True, kw_only=True)
class AppConfig:
    database_url: str
    redis_url: str = "redis://localhost:6379/0"
    debug: bool = False
    log_level: str = "INFO"
    workers: int = 4
    allowed_hosts: tuple[str, ...] = ()

    @classmethod
    def from_env(cls) -> "AppConfig":
        hosts = os.environ.get("ALLOWED_HOSTS", "")
        return cls(
            database_url=os.environ["DATABASE_URL"],
            redis_url=os.environ.get("REDIS_URL", "redis://localhost:6379/0"),
            debug=_bool_env("DEBUG"),
            log_level=os.environ.get("LOG_LEVEL", "INFO"),
            workers=int(os.environ.get("WORKERS", "4")),
            allowed_hosts=tuple(h.strip() for h in hosts.split(",") if h.strip()),
        )

os.environ["DATABASE_URL"] = "postgres://alicedev@myhost.local/app"
cfg = AppConfig.from_env()
print(cfg)

Output:

text
AppConfig(database_url='postgres://alicedev@myhost.local/app', redis_url='redis://localhost:6379/0', debug=False, log_level='INFO', workers=4, allowed_hosts=())

Round-trip through JSON

A dataclass round-trips cleanly through JSON if every field is a JSON-native type or has a known converter. Build from_dict and to_json helpers and you have a zero-dependency serializer.

python
from dataclasses import dataclass, field, asdict
from datetime import datetime
import json

@dataclass
class Event:
    id: int
    name: str
    created_at: datetime = field(default_factory=datetime.now)
    tags: list[str] = field(default_factory=list)

    def to_json(self) -> str:
        return json.dumps(asdict(self), default=str)

    @classmethod
    def from_dict(cls, d: dict) -> "Event":
        return cls(
            id=d["id"],
            name=d["name"],
            created_at=datetime.fromisoformat(d["created_at"]),
            tags=list(d.get("tags", [])),
        )

e = Event(id=1, name="signup", tags=["alpha"])
blob = e.to_json()
print(blob)
print(Event.from_dict(json.loads(blob)))

Output:

text
{"id": 1, "name": "signup", "created_at": "2026-05-25 15:01:11.842371", "tags": ["alpha"]}
Event(id=1, name='signup', created_at=datetime.datetime(2026, 5, 25, 15, 1, 11, 842371), tags=['alpha'])

Pattern-matched message dispatch

A frozen dataclass per message variant + match is the idiomatic way to write a state machine, parser, or reducer. The code reads top-down like a spec.

python
from dataclasses import dataclass

@dataclass(frozen=True)
class Connect:
    host: str
    port: int

@dataclass(frozen=True)
class Send:
    payload: bytes

@dataclass(frozen=True)
class Disconnect:
    reason: str

def handle(msg):
    match msg:
        case Connect(host=h, port=p):
            return f"opening connection to {h}:{p}"
        case Send(payload=b) if len(b) > 1024:
            return f"chunking {len(b)} bytes"
        case Send(payload=b):
            return f"sending {len(b)} bytes"
        case Disconnect(reason=r):
            return f"closed: {r}"

print(handle(Connect("myhost.local", 5432)))
print(handle(Send(b"x" * 2048)))
print(handle(Disconnect("timeout")))

Output:

text
opening connection to myhost.local:5432
chunking 2048 bytes
closed: timeout

Diff two configs

Walking fields() of a dataclass produces a compact, reusable diff for any pair of same-typed instances. Useful for "what changed?" log lines on reload.

python
from dataclasses import dataclass, fields

@dataclass(frozen=True)
class Config:
    debug: bool = False
    workers: int = 4
    host: str = "127.0.0.1"

def diff(a, b) -> dict:
    return {
        f.name: (getattr(a, f.name), getattr(b, f.name))
        for f in fields(a)
        if getattr(a, f.name) != getattr(b, f.name)
    }

old = Config()
new = Config(debug=True, workers=8)
print(diff(old, new))

Output:

text
{'debug': (False, True), 'workers': (4, 8)}