cheat sheet

Structured Output

Techniques for reliable structured generation — JSON mode, schema-constrained decoding, function/tool calls as output, and validator pairing with Pydantic or Zod.

updated 05-25-2026

Structured Output

What it is

Structured output is the discipline of getting a language model to emit data in a machine-parseable shape — JSON, XML, a function-call signature — instead of free prose. It's the seam between an LLM and the rest of your application. Done badly, you sprinkle regex everywhere and field-rename incidents leak silent bugs. Done well, every model output passes through a typed validator (Pydantic, Zod, JSON Schema) and your application code never touches an Any. The four techniques in this article — schema-constrained tool calls, JSON mode, prompt-only structuring with prefill, and validator-paired retry loops — cover essentially every case in production.

The reliability spectrum

Different techniques offer different guarantees. Listed in order of increasing reliability — pick the lightest technique that meets your need.

Technique	Schema guaranteed?	Format guaranteed?	Cost	Use when
Plain prose + regex	No	No	Low	One-off extraction, tolerate errors
Instruction "return JSON"	No	Mostly	Low	Prototype
Prefill `{` + stop seq	No	Yes (parses as JSON)	Low	Lightweight, no schema needed
Tool use with input_schema	Yes (Claude)	Yes	Low	Production extraction
JSON mode (OpenAI/some APIs)	Yes (if schema set)	Yes	Low	OpenAI compatibility
Outlines / Guidance / LM Format	Yes	Yes	Medium	Open-weights; constrained decoding
Validator + retry loop	Eventually	Eventually	Medium	Safety net on any of the above

For Claude, the most reliable schema-guaranteed approach is tool use with tool_choice={"type": "tool", "name": "..."}. It returns a parsed Python dict matching the input_schema exactly. For everything else, prefer pairing your generation method with a typed validator and a one-shot retry.

Tool use as a structured output channel

Claude's tool-use API was designed for function calling, but it doubles as the most reliable structured-output mechanism. You define a tool whose input_schema IS your target output shape, force Claude to call it via tool_choice, and read the parsed input back from the response. No regex, no JSON parsing, no escape-string headaches.

python

import anthropic

client = anthropic.Anthropic()

extract_tool = {
    "name": "store_invoice",
    "description": "Store the structured invoice data extracted from the document.",
    "input_schema": {
        "type": "object",
        "properties": {
            "invoice_number": {"type": "string"},
            "date": {"type": "string", "description": "ISO 8601 date"},
            "vendor_name": {"type": "string"},
            "total_amount": {"type": "number"},
            "currency": {
                "type": "string",
                "enum": ["USD", "EUR", "GBP", "CAD", "JPY"],
            },
            "line_items": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "description": {"type": "string"},
                        "quantity": {"type": "number"},
                        "unit_price": {"type": "number"},
                    },
                    "required": ["description", "quantity", "unit_price"],
                },
            },
        },
        "required": [
            "invoice_number", "date", "vendor_name", "total_amount",
            "currency", "line_items",
        ],
    },
}

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=2048,
    tools=[extract_tool],
    tool_choice={"type": "tool", "name": "store_invoice"},
    messages=[{"role": "user", "content": f"Extract this invoice:\n\n{invoice_text}"}],
)

tool_use_block = next(b for b in response.content if b.type == "tool_use")
print(tool_use_block.input)

Output:

text

{'invoice_number': 'INV-2025-0427', 'date': '2025-04-27', 'vendor_name': 'Acme Coffee',
 'total_amount': 42.50, 'currency': 'USD',
 'line_items': [{'description': 'Espresso', 'quantity': 2, 'unit_price': 4.50}, ...]}

The "tool" never has to actually do anything. It exists purely to define the schema and capture the output. Many teams name it store_X, record_X, or submit_X to signal that it's a sink rather than a callable.

Pydantic-paired validation

A typed validator turns the model's dict into an instance of a class — catching shape mismatches, type coercions, and missing fields with a clean traceback. Pydantic is the de-facto Python validator; the same pattern works with attrs, marshmallow, or dataclasses-json.

python

from pydantic import BaseModel, Field, ValidationError
from typing import Literal
from datetime import date

class LineItem(BaseModel):
    description: str
    quantity: float = Field(ge=0)
    unit_price: float = Field(ge=0)

class Invoice(BaseModel):
    invoice_number: str = Field(min_length=1)
    date: date
    vendor_name: str
    total_amount: float = Field(ge=0)
    currency: Literal["USD", "EUR", "GBP", "CAD", "JPY"]
    line_items: list[LineItem] = Field(min_length=1)

def extract_invoice(text: str) -> Invoice:
    response = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=2048,
        tools=[extract_tool],
        tool_choice={"type": "tool", "name": "store_invoice"},
        messages=[{"role": "user", "content": f"Extract this invoice:\n\n{text}"}],
    )
    tool_use = next(b for b in response.content if b.type == "tool_use")
    return Invoice.model_validate(tool_use.input)

Generate the tool schema FROM the Pydantic model using Invoice.model_json_schema() to keep them in sync. Pydantic's output is a valid JSON Schema; lightly trim fields like title and $defs that Claude doesn't need.

Generate schema from Pydantic

python

def pydantic_to_tool(model_class, tool_name: str, description: str) -> dict:
    schema = model_class.model_json_schema()
    # Inline $ref / $defs if present (simplified — production version handles nesting)
    return {
        "name": tool_name,
        "description": description,
        "input_schema": schema,
    }

invoice_tool = pydantic_to_tool(
    Invoice,
    tool_name="store_invoice",
    description="Store structured invoice data extracted from the document.",
)

TypeScript with Zod

The Zod equivalent of the Pydantic pattern. Zod schemas can be converted to JSON Schema via zod-to-json-schema and fed to Claude as a tool input schema.

typescript

import Anthropic from "@anthropic-ai/sdk";
import { z } from "zod";
import { zodToJsonSchema } from "zod-to-json-schema";

const LineItem = z.object({
  description: z.string(),
  quantity: z.number().nonnegative(),
  unit_price: z.number().nonnegative(),
});

const Invoice = z.object({
  invoice_number: z.string().min(1),
  date: z.string().regex(/^\d{4}-\d{2}-\d{2}$/),
  vendor_name: z.string(),
  total_amount: z.number().nonnegative(),
  currency: z.enum(["USD", "EUR", "GBP", "CAD", "JPY"]),
  line_items: z.array(LineItem).min(1),
});

const client = new Anthropic();

async function extractInvoice(text: string): Promise<z.infer<typeof Invoice>> {
  const response = await client.messages.create({
    model: "claude-opus-4-7",
    max_tokens: 2048,
    tools: [
      {
        name: "store_invoice",
        description: "Store structured invoice data.",
        input_schema: zodToJsonSchema(Invoice) as Anthropic.Tool.InputSchema,
      },
    ],
    tool_choice: { type: "tool", name: "store_invoice" },
    messages: [{ role: "user", content: `Extract this invoice:\n\n${text}` }],
  });

  const toolUse = response.content.find((b) => b.type === "tool_use");
  if (toolUse?.type !== "tool_use") throw new Error("No tool use in response");
  return Invoice.parse(toolUse.input);
}

JSON mode (prompt-only structuring)

When the API doesn't have a dedicated tool-use channel — or you want zero schema bookkeeping — instruct the model to return JSON and combine three techniques to maximize reliability: explicit schema in the prompt, prefill with {, and stop sequence after the closing brace.

python

SCHEMA_HINT = """
{
  "category": "billing" | "access" | "performance" | "bug" | "feature" | "other",
  "priority": "P1" | "P2" | "P3" | "P4",
  "summary": "<one sentence>",
  "needs_human": true | false
}
"""

def triage(ticket: str) -> dict:
    response = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=400,
        stop_sequences=["\n\n", "}\n"],
        messages=[
            {
                "role": "user",
                "content": (
                    f"Classify the support ticket. Output JSON matching this schema:\n"
                    f"{SCHEMA_HINT}\n"
                    f"Output ONLY the JSON object, no fences, no prose.\n\n"
                    f"Ticket: {ticket}"
                ),
            },
            {"role": "assistant", "content": "{"},   # prefill ensures it starts with JSON
        ],
    )
    raw = "{" + response.content[0].text
    if not raw.rstrip().endswith("}"):
        raw += "}"
    return json.loads(raw)

Prompt-only JSON is reliable on Opus and Sonnet but degrades on smaller models. If you can use tool use, you should. Reach for prompt-only JSON when you need cross-model portability or when adding tools is not worth the operational overhead.

Validator + retry loop

Even with tool use, validation occasionally fails — the model might omit a required field, return a number when you expected a string, or violate a regex pattern. The retry loop pattern catches the validation error and feeds it back as a corrective message. Cap at 2 retries; beyond that, fail loudly and queue for human review.

python

import json
import logging
from pydantic import ValidationError

def extract_with_validation(
    text: str,
    model_class: type[BaseModel],
    tool: dict,
    max_retries: int = 2,
) -> BaseModel | None:
    messages = [{"role": "user", "content": f"Extract:\n\n{text}"}]

    for attempt in range(max_retries + 1):
        resp = client.messages.create(
            model="claude-opus-4-7",
            max_tokens=2048,
            tools=[tool],
            tool_choice={"type": "tool", "name": tool["name"]},
            messages=messages,
        )
        tool_use = next((b for b in resp.content if b.type == "tool_use"), None)
        if not tool_use:
            return None

        try:
            return model_class.model_validate(tool_use.input)
        except ValidationError as e:
            logging.warning(f"Attempt {attempt + 1} failed: {e}")
            # Feed the error back and ask the model to retry
            messages.append({"role": "assistant", "content": resp.content})
            messages.append({
                "role": "user",
                "content": [{
                    "type": "tool_result",
                    "tool_use_id": tool_use.id,
                    "content": (
                        f"Validation failed: {e.errors()[:3]}. "
                        f"Call the tool again with corrected input."
                    ),
                    "is_error": True,
                }],
            })

    return None    # exhausted retries

Streaming structured output

When latency matters, stream the tool-call JSON as it's generated. The SDK exposes input_json_delta events with the partial JSON; combine them into a string and parse at message_stop.

python

import json

with client.messages.stream(
    model="claude-opus-4-7",
    max_tokens=2048,
    tools=[extract_tool],
    tool_choice={"type": "tool", "name": "store_invoice"},
    messages=[{"role": "user", "content": f"Extract: {text}"}],
) as stream:
    partial_json = ""
    for event in stream:
        if event.type == "content_block_delta" and event.delta.type == "input_json_delta":
            partial_json += event.delta.partial_json
            # Optionally: try to incrementally parse with a tolerant parser

    final = stream.get_final_message()
    tool_use = next(b for b in final.content if b.type == "tool_use")
    parsed = tool_use.input

print(parsed)

Use a tolerant streaming JSON parser (like partial-json on PyPI) if you need to render partial state to the user as the tool call streams in. Useful for live UI updates with progressive disclosure.

Schema design principles

The shape of your schema affects both LLM accuracy and downstream usability. Five rules that hold across providers:

sql

☐ Use enums for fixed value sets — eliminates hallucinated values
☐ Mark every truly-required field as `required` — Claude respects this
☐ Use descriptive field names, not abbreviations (currency_code, not ccy)
☐ Add a description to every field — Claude reads them; defaults are weak signals
☐ Prefer flat structures over deeply nested — easier to validate and debug
☐ Use null for "missing" rather than empty strings — distinguishes "not present" from ""
☐ Avoid free-text fields when an enum will do — extraction quality improves dramatically

Enum constraints

python

"properties": {
    "priority": {
        "type": "string",
        "enum": ["P1", "P2", "P3", "P4"],
        "description": "Severity: P1=outage, P2=blocked, P3=degraded, P4=question"
    },
    "sentiment": {
        "type": "string",
        "enum": ["angry", "frustrated", "neutral", "positive"],
        "description": "Tone of the customer"
    }
}

Discriminated unions

When a field can take several shapes, model it as a discriminated union — a type field that selects the shape of the rest. Claude understands these natively when the discriminator is an enum.

python

"properties": {
    "payment": {
        "type": "object",
        "properties": {
            "type": {"type": "string", "enum": ["card", "bank_transfer", "crypto"]},
            "last4": {"type": "string", "description": "Required if type=card"},
            "iban": {"type": "string", "description": "Required if type=bank_transfer"},
            "wallet_address": {"type": "string", "description": "Required if type=crypto"},
        },
        "required": ["type"]
    }
}

Optional vs required

python

"required": ["invoice_number", "date", "total_amount"]    # always present
# email, phone, address are NOT in required → may be omitted or null

If null is a valid value for an optional field, document it explicitly: "description": "Email if provided, else null". Claude will use null over making something up.

Open-weights constrained decoding

For models you run yourself (Llama, Qwen, DeepSeek), libraries like Outlines, Guidance, and LM Format Enforcer constrain generation at the token level — the model literally cannot emit a token that would break the schema. This gives 100% format conformance at the cost of slightly slower decoding.

python

# Outlines example (open-weights only)
from outlines import models, generate
from pydantic import BaseModel
from typing import Literal

class Ticket(BaseModel):
    category: Literal["billing", "access", "bug", "feature", "other"]
    priority: Literal["P1", "P2", "P3", "P4"]
    summary: str
    needs_human: bool

model = models.transformers("meta-llama/Llama-3-8B-Instruct")
generator = generate.json(model, Ticket)

result = generator("Customer reports: 'Page took 30 seconds to load.'")
print(result)

Output:

text

Ticket(category='bug', priority='P3', summary='Page load latency 30s', needs_human=False)

Constrained decoding does NOT replace prompt engineering — a poorly described task still produces semantically wrong outputs, just well-formatted ones. Use it as a guarantee on shape, not on correctness.

Nested schemas

Deeply nested schemas are harder for the model to populate correctly. Two mitigations: keep depth ≤ 3 levels where possible, and break large schemas into smaller calls when you can. A schema with 30+ leaf fields produces noticeably worse extraction than the same 30 fields split across 3 sequential calls.

python

# Heavy schema — works but error-prone
{
    "company": {
        "name": "...",
        "address": {
            "street": "...",
            "city": "...",
            "country": "...",
            "geo": {"lat": 0.0, "lng": 0.0}
        },
        "contacts": [{"name": "...", "role": "...", "email": "..."}]
    },
    "invoice": {...},
    "line_items": [...],
}

# Better — three sequential calls, each with a focused schema
company = extract_with_schema(text, CompanySchema)
invoice = extract_with_schema(text, InvoiceSchema)
items = extract_with_schema(text, LineItemsSchema)

Handling lists of unknown length

When the output is a list, include guidance on size bounds and what to do when nothing is found. Without it, models sometimes invent entries to fill perceived expectations.

python

"properties": {
    "tags": {
        "type": "array",
        "items": {"type": "string"},
        "description": "Topic tags from the article. Return empty array if no clear tags. Max 10."
    }
}

Error patterns and fixes

The repeated mistakes that show up when wiring up structured output, with the fix for each.

Symptom	Root cause	Fix
`JSONDecodeError: Expecting value`	Model wrapped JSON in ```fences	Prefill `{` + stop sequence on ```
Field is string when schema says number	Schema description vague	Add `"type": "number", "description": "Numeric. No commas, no $."`
Empty array when items exist	Schema didn't say to extract them	Description: "Extract every line item; do not skip"
Extra fields not in schema	LLM padded with extras	Pydantic `model_config = ConfigDict(extra="forbid")`
Required field missing	Long input, model lost it	Re-prompt with retry loop OR shorten input
Hallucinated enum value	Free-text field where enum would do	Convert to enum
Numbers come back as strings	Auto-formatting	Pydantic does coercion; or strict mode + retry
Date format inconsistent	No format hint	Add `"description": "ISO 8601 YYYY-MM-DD"`
Nested object missing levels	Schema too deep	Flatten or split into sequential calls
Different field order each run	Insertion-order varies	Order shouldn't matter — use dict access, not list index

Common pitfalls

Pitfall	Why it bites	Fix
Trusting raw model output without validation	Field types drift silently	Always pipe through Pydantic / Zod
Validator and prompt schema out of sync	Edits in one place, not the other	Generate tool schema FROM the model class
No retry on validation failure	One bad parse breaks the pipeline	One retry with the error fed back is cheap insurance
`extra="allow"` in Pydantic	LLM-added fields leak through	`extra="forbid"` in production
Schema hidden in prompt as prose	LLM has to re-derive structure each time	Use tool input_schema; cheaper, more reliable
Streaming partial JSON to user	Mid-stream parse errors	Buffer until stop event, then parse-and-render
Same schema for every model size	Smaller models drop fields	Validate against the WEAKEST model you support
Ignoring `is_error` flag in tool_result	Retries don't carry error signal	Set `is_error: true` so model knows to fix

Real-world recipes

Compact end-to-end patterns for the highest-leverage structured-output use cases.

CRUD payload generation

python

class UserUpdate(BaseModel):
    email: str | None = None
    name: str | None = None
    plan: Literal["free", "pro", "enterprise"] | None = None
    notify_billing: bool | None = None

update_tool = pydantic_to_tool(
    UserUpdate,
    tool_name="propose_user_update",
    description="Generate a partial user-update payload from a natural-language command.",
)

def parse_user_command(text: str) -> UserUpdate:
    resp = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=600,
        tools=[update_tool],
        tool_choice={"type": "tool", "name": "propose_user_update"},
        messages=[{
            "role": "user",
            "content": (
                f"Convert this admin command into a user-update payload. "
                f"Set fields to null when not mentioned.\n\n{text}"
            ),
        }],
    )
    return UserUpdate.model_validate(
        next(b for b in resp.content if b.type == "tool_use").input
    )

payload = parse_user_command("Upgrade alice@example.com to Pro and turn off billing emails.")
# UserUpdate(email='alice@example.com', name=None, plan='pro', notify_billing=False)

Multi-entity extraction

python

class Person(BaseModel):
    name: str
    role: str | None = None
    affiliation: str | None = None

class Event(BaseModel):
    name: str
    date: str | None = None
    location: str | None = None

class Extraction(BaseModel):
    people: list[Person] = Field(default_factory=list)
    events: list[Event] = Field(default_factory=list)
    organizations: list[str] = Field(default_factory=list)

extract_tool = pydantic_to_tool(
    Extraction,
    tool_name="record_entities",
    description="Record all named entities found in the document.",
)

def extract_entities(text: str) -> Extraction:
    resp = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=2048,
        tools=[extract_tool],
        tool_choice={"type": "tool", "name": "record_entities"},
        messages=[{"role": "user", "content": text}],
    )
    return Extraction.model_validate(
        next(b for b in resp.content if b.type == "tool_use").input
    )

Function-call dispatch (router)

python

class WeatherArgs(BaseModel):
    location: str
    units: Literal["celsius", "fahrenheit"] = "celsius"

class StockArgs(BaseModel):
    ticker: str

class CalendarArgs(BaseModel):
    person: str
    day: str

ROUTES = {
    "get_weather": (WeatherArgs, get_weather),
    "get_stock_price": (StockArgs, get_stock_price),
    "check_calendar": (CalendarArgs, check_calendar),
}

def route(user_msg: str) -> Any:
    tools = [
        {"name": name, "description": fn.__doc__, "input_schema": cls.model_json_schema()}
        for name, (cls, fn) in ROUTES.items()
    ]
    resp = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=400,
        tools=tools,
        tool_choice={"type": "any"},        # must call exactly one tool
        messages=[{"role": "user", "content": user_msg}],
    )
    tool_use = next(b for b in resp.content if b.type == "tool_use")
    cls, fn = ROUTES[tool_use.name]
    args = cls.model_validate(tool_use.input)
    return fn(**args.model_dump())

Form-fill from messy text

python

class JobApplication(BaseModel):
    full_name: str
    email: str = Field(pattern=r"^[^@]+@[^@]+\.[^@]+$")
    years_experience: int = Field(ge=0, le=80)
    desired_salary_usd: int | None = None
    open_to_remote: bool

application_tool = pydantic_to_tool(
    JobApplication,
    tool_name="submit_application",
    description="Submit the parsed job application after extracting fields from the resume.",
)

def parse_application(resume_text: str) -> JobApplication:
    return extract_with_validation(resume_text, JobApplication, application_tool)

Quick reference

Pick the right tool for the job.

Need	First technique
Hard schema guarantee, Claude	Tool use + `tool_choice={"type": "tool"}`
Cross-model portability	Prompt + prefill `{` + stop sequence
Open-weights model, must conform	Outlines / Guidance / LM Format Enforcer
Lazy/optional fields	Optional in Pydantic, NOT in `required`
Closed value set	Enum in schema
Mutually exclusive shapes	Discriminated union (type field)
Inconsistent dates	`description: ISO 8601 YYYY-MM-DD`
Catch bad outputs after generation	Pydantic / Zod + retry loop
Schema lives next to types	`pydantic_to_tool(MyModel, ...)`
Streaming for UX	`input_json_delta` events + buffered parse
Pick exactly one of many actions	Tools + `tool_choice={"type": "any"}`
Extract many entities at once	Single tool with `list[T]` fields