cheat sheet

pydantic

Validate and parse data at runtime using Python type hints with Pydantic v2. Covers BaseModel, field validators, nested models, and JSON serialization.

pydantic — Data Validation

What it is

Pydantic validates data against Python type annotations at runtime. You define a BaseModel with type-annotated fields; Pydantic enforces types, coerces compatible values, and raises detailed errors for invalid input. It is the foundation of FastAPI's request/response handling and is widely used for configuration, API clients, and data parsing.

This page covers Pydantic v2 (released 2023, current as of 2026). The API changed significantly from v1. Check your installed version: python -c "import pydantic; print(pydantic.__version__)".

Install

bash
pip install pydantic
# Optional: EmailStr and other validators
pip install "pydantic[email]"

Output: (none — exits 0 on success)

Quick example

python
from pydantic import BaseModel, ValidationError

class User(BaseModel):
    name: str
    age: int
    active: bool = True

u = User(name="Alice", age=30)
print(u.model_dump())

try:
    User(name="Alice", age="not-a-number")
except ValidationError as e:
    print(f"{e.error_count()} validation error(s)")
    print(e.errors()[0]["msg"])

Output:

text
{'name': 'Alice', 'age': 30, 'active': True}
1 validation error(s)
Input should be a valid integer, unable to parse string as an integer

When / why to use it

  • Parsing and validating API request bodies or responses.
  • Typed configuration objects loaded from env vars or YAML.
  • Any time you want Python type hints to be enforced at runtime, not just as documentation.

Common pitfalls

v1 → v2 migration — Pydantic v2 renamed many methods: dict()model_dump(), json()model_dump_json(), parse_obj()model_validate(). Using v1 syntax on v2 raises AttributeError.

Mutable defaults — use Field(default_factory=list) for mutable defaults like lists and dicts, not bare = []. Bare mutable defaults are shared across all instances (same as Python dataclass gotcha).

Use model_config = ConfigDict(strict=True) to disable Pydantic's coercion. By default "30" is silently coerced to 30 for an int field, which can hide bugs.

Richer example — nested models and validators

python
from pydantic import BaseModel, field_validator, Field
from typing import Optional

class Address(BaseModel):
    street: str
    city: str
    zip_code: str

class User(BaseModel):
    name: str
    age: int = Field(ge=0, le=130, description="Must be 0–130")
    email: str
    tags: list[str] = Field(default_factory=list)
    address: Optional[Address] = None

    @field_validator("email")
    @classmethod
    def email_must_contain_at(cls, v: str) -> str:
        if "@" not in v:
            raise ValueError("not a valid email address")
        return v.lower()

u = User(
    name="Alice",
    age=30,
    email="Alice@Example.COM",
    tags=["admin", "user"],
    address={"street": "123 Main St", "city": "Anytown", "zip_code": "12345"},
)
print(u.model_dump_json(indent=2))

Output:

text
{
  "name": "Alice",
  "age": 30,
  "email": "alice@example.com",
  "tags": [
    "admin",
    "user"
  ],
  "address": {
    "street": "123 Main St",
    "city": "Anytown",
    "zip_code": "12345"
  }
}

JSON and dict conversion

model_validate() constructs a model from a dict or JSON string, applying full validation. model_dump() serializes back to a plain dict; model_dump_json() returns a JSON string. Both support include/exclude parameters to control which fields are emitted.

python
# From dict
user = User.model_validate({"name": "Alice", "age": 25, "email": "alice@example.com"})

# From JSON string
user = User.model_validate_json('{"name":"Alice","age":25,"email":"alice@example.com"}')

# To dict
d = user.model_dump()
d = user.model_dump(exclude={"email"})        # exclude fields
d = user.model_dump(include={"name", "age"})  # include only

# To JSON string
j = user.model_dump_json()
j = user.model_dump_json(indent=2)

Settings from environment variables

pydantic-settings extends Pydantic with a BaseSettings class that reads field values from environment variables (case-insensitive) and optionally from a .env file. It is the standard pattern for twelve-factor app configuration — define your settings schema once, get validation and IDE autocomplete for free.

python
from pydantic_settings import BaseSettings  # pip install pydantic-settings

class Settings(BaseSettings):
    database_url: str
    api_key: str
    debug: bool = False

    class Config:
        env_file = ".env"

settings = Settings()  # reads DATABASE_URL, API_KEY, DEBUG from env / .env
print(settings.debug)

Output:

text
False

Model configuration

model_config (a ConfigDict on the class) controls global behavior: strict mode, extra-field policy, JSON encoders, alias support, frozen instances, and how the model handles arbitrary types. Setting a sensible model_config once at the top of every model file is the single biggest quality win when you migrate from v1.

python
from pydantic import BaseModel, ConfigDict, Field
from datetime import datetime

class APIModel(BaseModel):
    """Base for all API DTOs — strict, JSON-friendly, immutable."""
    model_config = ConfigDict(
        strict=True,                # no "3" → 3 coercion for ints
        extra="forbid",             # reject unknown fields on input
        frozen=True,                # instances are hashable and immutable
        populate_by_name=True,      # accept field names or aliases on input
        str_strip_whitespace=True,  # trim incoming strings
        validate_assignment=True,   # re-validate on attribute set
        ser_json_timedelta="iso8601",
        ser_json_bytes="base64",
        json_schema_extra={"x-owner": "platform"},
    )

class User(APIModel):
    id: int
    email: str = Field(alias="emailAddress")  # accept "emailAddress" too
    created_at: datetime
extra valueBehavior on unknown field
"ignore" (default)Silently drop unknown fields
"allow"Attach unknown fields to the instance
"forbid"Raise ValidationError

Use extra="forbid" for inbound request models — typos in client payloads should fail loudly. Use extra="allow" (rare) when wrapping an evolving third-party API where you want to preserve unknown fields.

Field — defaults, constraints, and metadata

Field(...) attaches validation rules, default values, examples, and JSON Schema metadata to a single field. Use default_factory for mutable defaults, numeric constraints (gt, ge, lt, le, multiple_of), string constraints (min_length, max_length, pattern), and list constraints (min_length, max_length). Most constraints also work via Annotated[…] from typing.

python
from pydantic import BaseModel, Field, HttpUrl, EmailStr
from typing import Annotated
from uuid import UUID, uuid4
from decimal import Decimal

class Product(BaseModel):
    id: UUID = Field(default_factory=uuid4)
    name: str = Field(min_length=1, max_length=120)
    slug: Annotated[str, Field(pattern=r"^[a-z0-9-]+$")]
    price: Decimal = Field(gt=0, decimal_places=2, max_digits=10)
    tags: list[str] = Field(default_factory=list, max_length=20)
    homepage: HttpUrl | None = None
    owner_email: EmailStr
    metadata: dict[str, str] = Field(default_factory=dict)

    # Description + examples flow into the OpenAPI schema
    sku: str = Field(
        description="Stock keeping unit",
        examples=["SKU-001", "SKU-XYZ-42"],
    )

Validators

Pydantic v2 has three validator decorators:

  • @field_validator(name, mode="after") — runs on a single field after type coercion. mode="before" runs before coercion, on the raw input.
  • @model_validator(mode="after") — runs once the whole model is built; perfect for cross-field checks.
  • @model_validator(mode="before") — runs on the raw input dict, before any field validation. Useful for normalizing aliases or restructuring legacy payloads.
python
from pydantic import BaseModel, field_validator, model_validator
from typing import Self

class Booking(BaseModel):
    start: int  # epoch seconds
    end: int
    seats: int
    seat_map: list[str]

    @field_validator("seats")
    @classmethod
    def seats_positive(cls, v: int) -> int:
        if v <= 0:
            raise ValueError("seats must be positive")
        return v

    @field_validator("seat_map", mode="before")
    @classmethod
    def split_comma_list(cls, v):
        # Accept "A1,A2,A3" *or* a real list, normalise to list
        if isinstance(v, str):
            return [s.strip() for s in v.split(",") if s.strip()]
        return v

    @model_validator(mode="after")
    def check_window_matches_seats(self) -> Self:
        if self.end <= self.start:
            raise ValueError("end must be after start")
        if len(self.seat_map) != self.seats:
            raise ValueError("seats count must match seat_map length")
        return self

# Computed fields — like @property, but serialised in model_dump()
from pydantic import computed_field

class Rectangle(BaseModel):
    width: float
    height: float

    @computed_field
    @property
    def area(self) -> float:
        return self.width * self.height

Use @field_validator(..., mode="before") to coerce loose inputs (CSV strings, ISO dates, legacy enum names) into the canonical type, then let the rest of the model assume the value is well-typed.

Serialization deep dive

model_dump() / model_dump_json() accept fine-grained options for controlling what comes out: include / exclude sets, alias usage, computed fields, and JSON-specific options (indent, separators). Custom serializers — declared with @field_serializer and @model_serializer — let you reshape the output without touching the field types.

python
from pydantic import BaseModel, Field, field_serializer
from datetime import datetime, timezone

class Event(BaseModel):
    id: int
    title: str = Field(serialization_alias="name")    # rename on output
    occurred_at: datetime
    secret: str

    @field_serializer("occurred_at")
    def fmt_occurred(self, v: datetime) -> str:
        return v.astimezone(timezone.utc).isoformat()

e = Event(id=1, title="Launch", occurred_at=datetime.now(), secret="shh")

# Various dump variants
e.model_dump()                                # {'id':1,'title':'Launch',...,'secret':'shh'}
e.model_dump(by_alias=True)                   # {'id':1,'name':'Launch',...}
e.model_dump(exclude={"secret"})              # drop secret
e.model_dump(include={"id", "title"})         # only these
e.model_dump(mode="json")                     # datetime → str, UUID → str, etc.
e.model_dump_json(indent=2, by_alias=True)
python
# Full custom serialization with @model_serializer
from pydantic import model_serializer

class Money(BaseModel):
    amount: int     # in minor units (cents)
    currency: str

    @model_serializer
    def to_string(self) -> str:
        return f"{self.amount/100:.2f} {self.currency}"

print(Money(amount=1099, currency="USD").model_dump())  # "10.99 USD"

TypeAdapter — validating non-model types

TypeAdapter lets you validate any type (a list[Item], a dict[str, list[int]], a TypedDict, a discriminated union) without wrapping it in a BaseModel. It exposes the same validate_python, validate_json, dump_python, and dump_json API and is the right tool for one-off validation, request bodies that aren't a single model, and reusable validation of standard library types.

python
from pydantic import TypeAdapter

# Validate a list of dicts as a list of User models
users_adapter = TypeAdapter(list[User])
users = users_adapter.validate_python([
    {"id": 1, "email": "alice@example.com"},
    {"id": 2, "email": "carol@example.com"},
])

# Validate / dump arbitrary types
IntList = TypeAdapter(list[int])
IntList.validate_python(["1", "2", "3"])      # → [1, 2, 3]
IntList.validate_json("[1, 2, 3]")
IntList.dump_python([1, 2, 3])
IntList.json_schema()                          # JSON Schema for list[int]

Custom types with Annotated

Annotated[T, …] is the standard library mechanism for attaching metadata to types. Pydantic uses it to read validation hooks (AfterValidator, BeforeValidator, WrapValidator) and serialisation hooks (PlainSerializer). It's the cleanest way to build reusable, typed primitives without subclassing BaseModel.

python
from typing import Annotated
from pydantic import AfterValidator, BeforeValidator, BaseModel
import re

def _slugify(v: str) -> str:
    return re.sub(r"[^a-z0-9]+", "-", v.lower()).strip("-")

def _check_slug(v: str) -> str:
    if not re.fullmatch(r"[a-z0-9-]+", v):
        raise ValueError(f"Not a valid slug: {v!r}")
    return v

# Reusable typed primitive
Slug = Annotated[str, BeforeValidator(_slugify), AfterValidator(_check_slug)]

class Post(BaseModel):
    title: str
    slug: Slug

print(Post(title="Hello, World!", slug="Hello, World!").slug)  # "hello-world"
python
# Pydantic's own annotated helpers
from pydantic import StringConstraints, NonNegativeInt, PositiveInt, conint, condecimal

ShortString = Annotated[str, StringConstraints(min_length=1, max_length=64, strip_whitespace=True)]
Age = Annotated[int, conint(ge=0, le=130)]
Money = Annotated[Decimal, condecimal(gt=0, decimal_places=2)]

Discriminated unions

When a field can be one of several models, discriminated unions make Pydantic pick the right model by inspecting a tag field, instead of trying each variant in order. They're faster, give better error messages, and produce a tidier OpenAPI schema (an oneOf with a discriminator).

python
from typing import Literal, Annotated
from pydantic import BaseModel, Field, TypeAdapter

class Email(BaseModel):
    kind: Literal["email"] = "email"
    address: str

class SMS(BaseModel):
    kind: Literal["sms"] = "sms"
    phone: str

class Webhook(BaseModel):
    kind: Literal["webhook"] = "webhook"
    url: str

Channel = Annotated[Email | SMS | Webhook, Field(discriminator="kind")]

adapter = TypeAdapter(Channel)
adapter.validate_python({"kind": "sms", "phone": "+1-555-0100"})
adapter.validate_python({"kind": "webhook", "url": "https://example.com/hook"})

JSON Schema generation

Model.model_json_schema() produces a JSON Schema (Draft 2020-12) that FastAPI, Litestar, and a dozen IDE tools use directly. Tune the output with json_schema_extra, examples, serialization_alias, and the ref_template argument.

python
import json
from pydantic import BaseModel, Field

class User(BaseModel):
    id: int = Field(description="DB primary key", examples=[1])
    email: str = Field(pattern=r".+@.+", examples=["alice@example.com"])

    model_config = {
        "json_schema_extra": {
            "examples": [{"id": 1, "email": "alice@example.com"}],
        }
    }

print(json.dumps(User.model_json_schema(), indent=2))

Settings — pydantic-settings deep dive

pydantic-settings reads field values from environment variables, dotenv files, secret files, and (with extras) AWS Secrets Manager / Azure Key Vault. Configure precedence and sources with model_config = SettingsConfigDict(...); everything else is just a normal BaseModel.

python
from pydantic import Field, SecretStr
from pydantic_settings import BaseSettings, SettingsConfigDict
from functools import lru_cache

class DatabaseSettings(BaseSettings):
    model_config = SettingsConfigDict(env_prefix="DB_")
    url: str
    pool_size: int = 10
    ssl: bool = False

class Settings(BaseSettings):
    model_config = SettingsConfigDict(
        env_file=".env",
        env_file_encoding="utf-8",
        env_nested_delimiter="__",        # DB__URL, DB__POOL_SIZE
        case_sensitive=False,
        extra="ignore",
        secrets_dir="/run/secrets",       # Docker / k8s secrets
    )

    app_name: str = "myapp"
    debug: bool = False
    secret_key: SecretStr                 # value is masked in repr() / dump()
    db: DatabaseSettings = Field(default_factory=DatabaseSettings)
    allowed_hosts: list[str] = Field(default_factory=lambda: ["localhost"])

@lru_cache
def get_settings() -> Settings:
    return Settings()

# In a FastAPI dep:
# def settings(s: Annotated[Settings, Depends(get_settings)]): ...
bash
# .env
DEBUG=true
SECRET_KEY=super-secret
DB__URL=postgresql://localhost/myapp
DB__POOL_SIZE=20
ALLOWED_HOSTS=["api.example.com","app.example.com"]   # JSON for complex types

Output: (none — exits 0 on success)

Pydantic vs dataclasses vs msgspec vs attrs

The four most common "typed class" libraries on Python all share the same surface — annotate fields, get __init__ and __repr__ for free — but they differ sharply on what happens at runtime.

Featurepydantic v2dataclasses (stdlib)attrsmsgspec
Runtime type validationYes (strict + coerce modes)NoOptional via converters / validatorsYes
Speed (validate + dump)Fast (Rust core)N/AFastFastest
JSON SchemaYesNoNoPartial
Settings / env loadingpydantic-settingsNoNoNo
Field aliasingYesNoYesYes
Discriminated unionsYesNoNoYes
Built-in coercionYes ("1"1)NoCustom convertersNo
Best forAPI request/response models, config, FastAPIPlain typed records, value objectsInternal domain models with custom hooksUltra-fast message parsing (gRPC-like)

See dataclasses for the stdlib comparison and typing for the Annotated[...] patterns Pydantic exposes through BeforeValidator / AfterValidator.

python
# Same shape, three libraries
from dataclasses import dataclass
@dataclass
class UserDC:
    name: str
    age: int

import attrs
@attrs.define
class UserAttrs:
    name: str
    age: int = attrs.field(validator=attrs.validators.ge(0))

from pydantic import BaseModel
class UserPyd(BaseModel):
    name: str
    age: int

# Runtime behaviour with bad input
UserDC(name="A", age="30")     # accepted — age is "30" (str), no validation
UserAttrs(name="A", age=-1)    # ValueError: age must be >= 0
UserPyd(name="A", age="30")    # accepted — coerced to int 30 (lax mode)
UserPyd.model_validate({"name": "A", "age": "x"})  # ValidationError

Common pitfalls (extended)

Mixing v1 and v2 syntaxConfig (inner class) is v1; model_config = ConfigDict(...) is v2. @validator is v1; @field_validator is v2. The two coexist in some codebases via pydantic.v1 shim — read the module path before copying examples.

JSON Schema vs dump shapes can divergeserialization_alias only changes model_dump(). To rename on input too, use Field(alias="...") and populate_by_name=True. Triple-check round-trips: M.model_validate(M(...).model_dump()) should always succeed.

model_validator(mode="after") returns Self — must return self at the end. Forgetting it returns None, and the next field access raises AttributeError.

When porting from dataclasses, drop default_factory for empty containers and use Field(default_factory=...) — Pydantic v2 will warn if you write tags: list[str] = [].

Real-world recipes

API request, DB, and response model trio

python
# Three models for one resource — input, internal/DB, output
class UserCreate(BaseModel):
    email: EmailStr
    password: SecretStr        # masked in logs / dumps

class UserDB(BaseModel):
    model_config = ConfigDict(from_attributes=True)  # build from an ORM row
    id: int
    email: EmailStr
    password_hash: str
    created_at: datetime

class UserOut(BaseModel):
    id: int
    email: EmailStr
    created_at: datetime

def create_user(payload: UserCreate, db) -> UserOut:
    row = db.insert_user(
        email=payload.email,
        password_hash=hash_password(payload.password.get_secret_value()),
    )
    db_user = UserDB.model_validate(row)
    return UserOut.model_validate(db_user.model_dump())

Loading typed config from YAML

python
import yaml
from pathlib import Path
from pydantic import BaseModel

class FeatureFlags(BaseModel):
    new_dashboard: bool = False
    rollout_percentage: int = Field(0, ge=0, le=100)

class AppConfig(BaseModel):
    name: str
    version: str
    features: FeatureFlags

config = AppConfig.model_validate(yaml.safe_load(Path("config.yaml").read_text()))

Validating responses from an external API

python
import httpx
from pydantic import TypeAdapter

class GithubRepo(BaseModel):
    id: int
    full_name: str
    stargazers_count: int

RepoList = TypeAdapter(list[GithubRepo])

with httpx.Client(timeout=10) as client:
    raw = client.get("https://api.github.com/users/python/repos").json()
    repos = RepoList.validate_python(raw)
    top = sorted(repos, key=lambda r: r.stargazers_count, reverse=True)[:5]
    for r in top:
        print(r.full_name, r.stargazers_count)

Quick reference

TaskCode
Define modelclass M(BaseModel): name: str
Validate dictM.model_validate({"name": "A"})
Validate JSONM.model_validate_json('{"name":"A"}')
To dictm.model_dump()
To JSONm.model_dump_json(indent=2)
Strict modemodel_config = ConfigDict(strict=True)
Forbid unknownmodel_config = ConfigDict(extra="forbid")
Immutablemodel_config = ConfigDict(frozen=True)
Default factorytags: list[str] = Field(default_factory=list)
Field constraintsage: int = Field(ge=0, le=130)
String patternslug: str = Field(pattern=r"^[a-z-]+$")
Aliasemail: str = Field(alias="emailAddress")
Field validator@field_validator("x") @classmethod def fn(cls, v): ...
Model validator@model_validator(mode="after") def fn(self) -> Self: ...
Computed field@computed_field @property def area(self) -> float: ...
Custom serializer@field_serializer("x") def fmt(self, v): ...
Discriminated unionField(discriminator="kind")
Validate non-modelTypeAdapter(list[Item]).validate_python(...)
JSON SchemaM.model_json_schema()
Settingsclass S(BaseSettings): ... (pip install pydantic-settings)
From ORM rowmodel_config = ConfigDict(from_attributes=True)
Secret valuepassword: SecretStr