cheat sheet
pydantic
Validate and parse data at runtime using Python type hints with Pydantic v2. Covers BaseModel, field validators, nested models, and JSON serialization.
pydantic — Data Validation
What it is
Pydantic validates data against Python type annotations at runtime. You define a BaseModel with type-annotated fields; Pydantic enforces types, coerces compatible values, and raises detailed errors for invalid input. It is the foundation of FastAPI's request/response handling and is widely used for configuration, API clients, and data parsing.
This page covers Pydantic v2 (released 2023, current as of 2026). The API changed significantly from v1. Check your installed version:
python -c "import pydantic; print(pydantic.__version__)".
Install
pip install pydantic
# Optional: EmailStr and other validators
pip install "pydantic[email]"
Output: (none — exits 0 on success)
Quick example
from pydantic import BaseModel, ValidationError
class User(BaseModel):
name: str
age: int
active: bool = True
u = User(name="Alice", age=30)
print(u.model_dump())
try:
User(name="Alice", age="not-a-number")
except ValidationError as e:
print(f"{e.error_count()} validation error(s)")
print(e.errors()[0]["msg"])
Output:
{'name': 'Alice', 'age': 30, 'active': True}
1 validation error(s)
Input should be a valid integer, unable to parse string as an integer
When / why to use it
- Parsing and validating API request bodies or responses.
- Typed configuration objects loaded from env vars or YAML.
- Any time you want Python type hints to be enforced at runtime, not just as documentation.
Common pitfalls
v1 → v2 migration — Pydantic v2 renamed many methods:
dict()→model_dump(),json()→model_dump_json(),parse_obj()→model_validate(). Using v1 syntax on v2 raisesAttributeError.
Mutable defaults — use
Field(default_factory=list)for mutable defaults like lists and dicts, not bare= []. Bare mutable defaults are shared across all instances (same as Python dataclass gotcha).
Use
model_config = ConfigDict(strict=True)to disable Pydantic's coercion. By default"30"is silently coerced to30for anintfield, which can hide bugs.
Richer example — nested models and validators
from pydantic import BaseModel, field_validator, Field
from typing import Optional
class Address(BaseModel):
street: str
city: str
zip_code: str
class User(BaseModel):
name: str
age: int = Field(ge=0, le=130, description="Must be 0–130")
email: str
tags: list[str] = Field(default_factory=list)
address: Optional[Address] = None
@field_validator("email")
@classmethod
def email_must_contain_at(cls, v: str) -> str:
if "@" not in v:
raise ValueError("not a valid email address")
return v.lower()
u = User(
name="Alice",
age=30,
email="Alice@Example.COM",
tags=["admin", "user"],
address={"street": "123 Main St", "city": "Anytown", "zip_code": "12345"},
)
print(u.model_dump_json(indent=2))
Output:
{
"name": "Alice",
"age": 30,
"email": "alice@example.com",
"tags": [
"admin",
"user"
],
"address": {
"street": "123 Main St",
"city": "Anytown",
"zip_code": "12345"
}
}
JSON and dict conversion
model_validate() constructs a model from a dict or JSON string, applying full validation. model_dump() serializes back to a plain dict; model_dump_json() returns a JSON string. Both support include/exclude parameters to control which fields are emitted.
# From dict
user = User.model_validate({"name": "Alice", "age": 25, "email": "alice@example.com"})
# From JSON string
user = User.model_validate_json('{"name":"Alice","age":25,"email":"alice@example.com"}')
# To dict
d = user.model_dump()
d = user.model_dump(exclude={"email"}) # exclude fields
d = user.model_dump(include={"name", "age"}) # include only
# To JSON string
j = user.model_dump_json()
j = user.model_dump_json(indent=2)
Settings from environment variables
pydantic-settings extends Pydantic with a BaseSettings class that reads field values from environment variables (case-insensitive) and optionally from a .env file. It is the standard pattern for twelve-factor app configuration — define your settings schema once, get validation and IDE autocomplete for free.
from pydantic_settings import BaseSettings # pip install pydantic-settings
class Settings(BaseSettings):
database_url: str
api_key: str
debug: bool = False
class Config:
env_file = ".env"
settings = Settings() # reads DATABASE_URL, API_KEY, DEBUG from env / .env
print(settings.debug)
Output:
False
Model configuration
model_config (a ConfigDict on the class) controls global behavior: strict mode, extra-field policy, JSON encoders, alias support, frozen instances, and how the model handles arbitrary types. Setting a sensible model_config once at the top of every model file is the single biggest quality win when you migrate from v1.
from pydantic import BaseModel, ConfigDict, Field
from datetime import datetime
class APIModel(BaseModel):
"""Base for all API DTOs — strict, JSON-friendly, immutable."""
model_config = ConfigDict(
strict=True, # no "3" → 3 coercion for ints
extra="forbid", # reject unknown fields on input
frozen=True, # instances are hashable and immutable
populate_by_name=True, # accept field names or aliases on input
str_strip_whitespace=True, # trim incoming strings
validate_assignment=True, # re-validate on attribute set
ser_json_timedelta="iso8601",
ser_json_bytes="base64",
json_schema_extra={"x-owner": "platform"},
)
class User(APIModel):
id: int
email: str = Field(alias="emailAddress") # accept "emailAddress" too
created_at: datetime
extra value | Behavior on unknown field |
|---|---|
"ignore" (default) | Silently drop unknown fields |
"allow" | Attach unknown fields to the instance |
"forbid" | Raise ValidationError |
Use
extra="forbid"for inbound request models — typos in client payloads should fail loudly. Useextra="allow"(rare) when wrapping an evolving third-party API where you want to preserve unknown fields.
Field — defaults, constraints, and metadata
Field(...) attaches validation rules, default values, examples, and JSON Schema metadata to a single field. Use default_factory for mutable defaults, numeric constraints (gt, ge, lt, le, multiple_of), string constraints (min_length, max_length, pattern), and list constraints (min_length, max_length). Most constraints also work via Annotated[…] from typing.
from pydantic import BaseModel, Field, HttpUrl, EmailStr
from typing import Annotated
from uuid import UUID, uuid4
from decimal import Decimal
class Product(BaseModel):
id: UUID = Field(default_factory=uuid4)
name: str = Field(min_length=1, max_length=120)
slug: Annotated[str, Field(pattern=r"^[a-z0-9-]+$")]
price: Decimal = Field(gt=0, decimal_places=2, max_digits=10)
tags: list[str] = Field(default_factory=list, max_length=20)
homepage: HttpUrl | None = None
owner_email: EmailStr
metadata: dict[str, str] = Field(default_factory=dict)
# Description + examples flow into the OpenAPI schema
sku: str = Field(
description="Stock keeping unit",
examples=["SKU-001", "SKU-XYZ-42"],
)
Validators
Pydantic v2 has three validator decorators:
@field_validator(name, mode="after")— runs on a single field after type coercion.mode="before"runs before coercion, on the raw input.@model_validator(mode="after")— runs once the whole model is built; perfect for cross-field checks.@model_validator(mode="before")— runs on the raw input dict, before any field validation. Useful for normalizing aliases or restructuring legacy payloads.
from pydantic import BaseModel, field_validator, model_validator
from typing import Self
class Booking(BaseModel):
start: int # epoch seconds
end: int
seats: int
seat_map: list[str]
@field_validator("seats")
@classmethod
def seats_positive(cls, v: int) -> int:
if v <= 0:
raise ValueError("seats must be positive")
return v
@field_validator("seat_map", mode="before")
@classmethod
def split_comma_list(cls, v):
# Accept "A1,A2,A3" *or* a real list, normalise to list
if isinstance(v, str):
return [s.strip() for s in v.split(",") if s.strip()]
return v
@model_validator(mode="after")
def check_window_matches_seats(self) -> Self:
if self.end <= self.start:
raise ValueError("end must be after start")
if len(self.seat_map) != self.seats:
raise ValueError("seats count must match seat_map length")
return self
# Computed fields — like @property, but serialised in model_dump()
from pydantic import computed_field
class Rectangle(BaseModel):
width: float
height: float
@computed_field
@property
def area(self) -> float:
return self.width * self.height
Use
@field_validator(..., mode="before")to coerce loose inputs (CSV strings, ISO dates, legacy enum names) into the canonical type, then let the rest of the model assume the value is well-typed.
Serialization deep dive
model_dump() / model_dump_json() accept fine-grained options for controlling what comes out: include / exclude sets, alias usage, computed fields, and JSON-specific options (indent, separators). Custom serializers — declared with @field_serializer and @model_serializer — let you reshape the output without touching the field types.
from pydantic import BaseModel, Field, field_serializer
from datetime import datetime, timezone
class Event(BaseModel):
id: int
title: str = Field(serialization_alias="name") # rename on output
occurred_at: datetime
secret: str
@field_serializer("occurred_at")
def fmt_occurred(self, v: datetime) -> str:
return v.astimezone(timezone.utc).isoformat()
e = Event(id=1, title="Launch", occurred_at=datetime.now(), secret="shh")
# Various dump variants
e.model_dump() # {'id':1,'title':'Launch',...,'secret':'shh'}
e.model_dump(by_alias=True) # {'id':1,'name':'Launch',...}
e.model_dump(exclude={"secret"}) # drop secret
e.model_dump(include={"id", "title"}) # only these
e.model_dump(mode="json") # datetime → str, UUID → str, etc.
e.model_dump_json(indent=2, by_alias=True)
# Full custom serialization with @model_serializer
from pydantic import model_serializer
class Money(BaseModel):
amount: int # in minor units (cents)
currency: str
@model_serializer
def to_string(self) -> str:
return f"{self.amount/100:.2f} {self.currency}"
print(Money(amount=1099, currency="USD").model_dump()) # "10.99 USD"
TypeAdapter — validating non-model types
TypeAdapter lets you validate any type (a list[Item], a dict[str, list[int]], a TypedDict, a discriminated union) without wrapping it in a BaseModel. It exposes the same validate_python, validate_json, dump_python, and dump_json API and is the right tool for one-off validation, request bodies that aren't a single model, and reusable validation of standard library types.
from pydantic import TypeAdapter
# Validate a list of dicts as a list of User models
users_adapter = TypeAdapter(list[User])
users = users_adapter.validate_python([
{"id": 1, "email": "alice@example.com"},
{"id": 2, "email": "carol@example.com"},
])
# Validate / dump arbitrary types
IntList = TypeAdapter(list[int])
IntList.validate_python(["1", "2", "3"]) # → [1, 2, 3]
IntList.validate_json("[1, 2, 3]")
IntList.dump_python([1, 2, 3])
IntList.json_schema() # JSON Schema for list[int]
Custom types with Annotated
Annotated[T, …] is the standard library mechanism for attaching metadata to types. Pydantic uses it to read validation hooks (AfterValidator, BeforeValidator, WrapValidator) and serialisation hooks (PlainSerializer). It's the cleanest way to build reusable, typed primitives without subclassing BaseModel.
from typing import Annotated
from pydantic import AfterValidator, BeforeValidator, BaseModel
import re
def _slugify(v: str) -> str:
return re.sub(r"[^a-z0-9]+", "-", v.lower()).strip("-")
def _check_slug(v: str) -> str:
if not re.fullmatch(r"[a-z0-9-]+", v):
raise ValueError(f"Not a valid slug: {v!r}")
return v
# Reusable typed primitive
Slug = Annotated[str, BeforeValidator(_slugify), AfterValidator(_check_slug)]
class Post(BaseModel):
title: str
slug: Slug
print(Post(title="Hello, World!", slug="Hello, World!").slug) # "hello-world"
# Pydantic's own annotated helpers
from pydantic import StringConstraints, NonNegativeInt, PositiveInt, conint, condecimal
ShortString = Annotated[str, StringConstraints(min_length=1, max_length=64, strip_whitespace=True)]
Age = Annotated[int, conint(ge=0, le=130)]
Money = Annotated[Decimal, condecimal(gt=0, decimal_places=2)]
Discriminated unions
When a field can be one of several models, discriminated unions make Pydantic pick the right model by inspecting a tag field, instead of trying each variant in order. They're faster, give better error messages, and produce a tidier OpenAPI schema (an oneOf with a discriminator).
from typing import Literal, Annotated
from pydantic import BaseModel, Field, TypeAdapter
class Email(BaseModel):
kind: Literal["email"] = "email"
address: str
class SMS(BaseModel):
kind: Literal["sms"] = "sms"
phone: str
class Webhook(BaseModel):
kind: Literal["webhook"] = "webhook"
url: str
Channel = Annotated[Email | SMS | Webhook, Field(discriminator="kind")]
adapter = TypeAdapter(Channel)
adapter.validate_python({"kind": "sms", "phone": "+1-555-0100"})
adapter.validate_python({"kind": "webhook", "url": "https://example.com/hook"})
JSON Schema generation
Model.model_json_schema() produces a JSON Schema (Draft 2020-12) that FastAPI, Litestar, and a dozen IDE tools use directly. Tune the output with json_schema_extra, examples, serialization_alias, and the ref_template argument.
import json
from pydantic import BaseModel, Field
class User(BaseModel):
id: int = Field(description="DB primary key", examples=[1])
email: str = Field(pattern=r".+@.+", examples=["alice@example.com"])
model_config = {
"json_schema_extra": {
"examples": [{"id": 1, "email": "alice@example.com"}],
}
}
print(json.dumps(User.model_json_schema(), indent=2))
Settings — pydantic-settings deep dive
pydantic-settings reads field values from environment variables, dotenv files, secret files, and (with extras) AWS Secrets Manager / Azure Key Vault. Configure precedence and sources with model_config = SettingsConfigDict(...); everything else is just a normal BaseModel.
from pydantic import Field, SecretStr
from pydantic_settings import BaseSettings, SettingsConfigDict
from functools import lru_cache
class DatabaseSettings(BaseSettings):
model_config = SettingsConfigDict(env_prefix="DB_")
url: str
pool_size: int = 10
ssl: bool = False
class Settings(BaseSettings):
model_config = SettingsConfigDict(
env_file=".env",
env_file_encoding="utf-8",
env_nested_delimiter="__", # DB__URL, DB__POOL_SIZE
case_sensitive=False,
extra="ignore",
secrets_dir="/run/secrets", # Docker / k8s secrets
)
app_name: str = "myapp"
debug: bool = False
secret_key: SecretStr # value is masked in repr() / dump()
db: DatabaseSettings = Field(default_factory=DatabaseSettings)
allowed_hosts: list[str] = Field(default_factory=lambda: ["localhost"])
@lru_cache
def get_settings() -> Settings:
return Settings()
# In a FastAPI dep:
# def settings(s: Annotated[Settings, Depends(get_settings)]): ...
# .env
DEBUG=true
SECRET_KEY=super-secret
DB__URL=postgresql://localhost/myapp
DB__POOL_SIZE=20
ALLOWED_HOSTS=["api.example.com","app.example.com"] # JSON for complex types
Output: (none — exits 0 on success)
Pydantic vs dataclasses vs msgspec vs attrs
The four most common "typed class" libraries on Python all share the same surface — annotate fields, get __init__ and __repr__ for free — but they differ sharply on what happens at runtime.
| Feature | pydantic v2 | dataclasses (stdlib) | attrs | msgspec |
|---|---|---|---|---|
| Runtime type validation | Yes (strict + coerce modes) | No | Optional via converters / validators | Yes |
| Speed (validate + dump) | Fast (Rust core) | N/A | Fast | Fastest |
| JSON Schema | Yes | No | No | Partial |
| Settings / env loading | pydantic-settings | No | No | No |
| Field aliasing | Yes | No | Yes | Yes |
| Discriminated unions | Yes | No | No | Yes |
| Built-in coercion | Yes ("1" → 1) | No | Custom converters | No |
| Best for | API request/response models, config, FastAPI | Plain typed records, value objects | Internal domain models with custom hooks | Ultra-fast message parsing (gRPC-like) |
See dataclasses for the stdlib comparison and typing for the Annotated[...] patterns Pydantic exposes through BeforeValidator / AfterValidator.
# Same shape, three libraries
from dataclasses import dataclass
@dataclass
class UserDC:
name: str
age: int
import attrs
@attrs.define
class UserAttrs:
name: str
age: int = attrs.field(validator=attrs.validators.ge(0))
from pydantic import BaseModel
class UserPyd(BaseModel):
name: str
age: int
# Runtime behaviour with bad input
UserDC(name="A", age="30") # accepted — age is "30" (str), no validation
UserAttrs(name="A", age=-1) # ValueError: age must be >= 0
UserPyd(name="A", age="30") # accepted — coerced to int 30 (lax mode)
UserPyd.model_validate({"name": "A", "age": "x"}) # ValidationError
Common pitfalls (extended)
Mixing v1 and v2 syntax —
Config(inner class) is v1;model_config = ConfigDict(...)is v2.@validatoris v1;@field_validatoris v2. The two coexist in some codebases viapydantic.v1shim — read the module path before copying examples.
JSON Schema vs dump shapes can diverge —
serialization_aliasonly changesmodel_dump(). To rename on input too, useField(alias="...")andpopulate_by_name=True. Triple-check round-trips:M.model_validate(M(...).model_dump())should always succeed.
model_validator(mode="after")returnsSelf— mustreturn selfat the end. Forgetting it returnsNone, and the next field access raisesAttributeError.
When porting from dataclasses, drop
default_factoryfor empty containers and useField(default_factory=...)— Pydantic v2 will warn if you writetags: list[str] = [].
Real-world recipes
API request, DB, and response model trio
# Three models for one resource — input, internal/DB, output
class UserCreate(BaseModel):
email: EmailStr
password: SecretStr # masked in logs / dumps
class UserDB(BaseModel):
model_config = ConfigDict(from_attributes=True) # build from an ORM row
id: int
email: EmailStr
password_hash: str
created_at: datetime
class UserOut(BaseModel):
id: int
email: EmailStr
created_at: datetime
def create_user(payload: UserCreate, db) -> UserOut:
row = db.insert_user(
email=payload.email,
password_hash=hash_password(payload.password.get_secret_value()),
)
db_user = UserDB.model_validate(row)
return UserOut.model_validate(db_user.model_dump())
Loading typed config from YAML
import yaml
from pathlib import Path
from pydantic import BaseModel
class FeatureFlags(BaseModel):
new_dashboard: bool = False
rollout_percentage: int = Field(0, ge=0, le=100)
class AppConfig(BaseModel):
name: str
version: str
features: FeatureFlags
config = AppConfig.model_validate(yaml.safe_load(Path("config.yaml").read_text()))
Validating responses from an external API
import httpx
from pydantic import TypeAdapter
class GithubRepo(BaseModel):
id: int
full_name: str
stargazers_count: int
RepoList = TypeAdapter(list[GithubRepo])
with httpx.Client(timeout=10) as client:
raw = client.get("https://api.github.com/users/python/repos").json()
repos = RepoList.validate_python(raw)
top = sorted(repos, key=lambda r: r.stargazers_count, reverse=True)[:5]
for r in top:
print(r.full_name, r.stargazers_count)
Quick reference
| Task | Code |
|---|---|
| Define model | class M(BaseModel): name: str |
| Validate dict | M.model_validate({"name": "A"}) |
| Validate JSON | M.model_validate_json('{"name":"A"}') |
| To dict | m.model_dump() |
| To JSON | m.model_dump_json(indent=2) |
| Strict mode | model_config = ConfigDict(strict=True) |
| Forbid unknown | model_config = ConfigDict(extra="forbid") |
| Immutable | model_config = ConfigDict(frozen=True) |
| Default factory | tags: list[str] = Field(default_factory=list) |
| Field constraints | age: int = Field(ge=0, le=130) |
| String pattern | slug: str = Field(pattern=r"^[a-z-]+$") |
| Alias | email: str = Field(alias="emailAddress") |
| Field validator | @field_validator("x") @classmethod def fn(cls, v): ... |
| Model validator | @model_validator(mode="after") def fn(self) -> Self: ... |
| Computed field | @computed_field @property def area(self) -> float: ... |
| Custom serializer | @field_serializer("x") def fmt(self, v): ... |
| Discriminated union | Field(discriminator="kind") |
| Validate non-model | TypeAdapter(list[Item]).validate_python(...) |
| JSON Schema | M.model_json_schema() |
| Settings | class S(BaseSettings): ... (pip install pydantic-settings) |
| From ORM row | model_config = ConfigDict(from_attributes=True) |
| Secret value | password: SecretStr |