cheat sheet

Structured Output

Techniques for reliable structured generation — JSON mode, schema-constrained decoding, function/tool calls as output, and validator pairing with Pydantic or Zod.

Structured Output

What it is

Structured output is the discipline of getting a language model to emit data in a machine-parseable shape — JSON, XML, a function-call signature — instead of free prose. It's the seam between an LLM and the rest of your application. Done badly, you sprinkle regex everywhere and field-rename incidents leak silent bugs. Done well, every model output passes through a typed validator (Pydantic, Zod, JSON Schema) and your application code never touches an Any. The four techniques in this article — schema-constrained tool calls, JSON mode, prompt-only structuring with prefill, and validator-paired retry loops — cover essentially every case in production.

The reliability spectrum

Different techniques offer different guarantees. Listed in order of increasing reliability — pick the lightest technique that meets your need.

TechniqueSchema guaranteed?Format guaranteed?CostUse when
Plain prose + regexNoNoLowOne-off extraction, tolerate errors
Instruction "return JSON"NoMostlyLowPrototype
Prefill { + stop seqNoYes (parses as JSON)LowLightweight, no schema needed
Tool use with input_schemaYes (Claude)YesLowProduction extraction
JSON mode (OpenAI/some APIs)Yes (if schema set)YesLowOpenAI compatibility
Outlines / Guidance / LM FormatYesYesMediumOpen-weights; constrained decoding
Validator + retry loopEventuallyEventuallyMediumSafety net on any of the above

For Claude, the most reliable schema-guaranteed approach is tool use with tool_choice={"type": "tool", "name": "..."}. It returns a parsed Python dict matching the input_schema exactly. For everything else, prefer pairing your generation method with a typed validator and a one-shot retry.

Tool use as a structured output channel

Claude's tool-use API was designed for function calling, but it doubles as the most reliable structured-output mechanism. You define a tool whose input_schema IS your target output shape, force Claude to call it via tool_choice, and read the parsed input back from the response. No regex, no JSON parsing, no escape-string headaches.

python
import anthropic

client = anthropic.Anthropic()

extract_tool = {
    "name": "store_invoice",
    "description": "Store the structured invoice data extracted from the document.",
    "input_schema": {
        "type": "object",
        "properties": {
            "invoice_number": {"type": "string"},
            "date": {"type": "string", "description": "ISO 8601 date"},
            "vendor_name": {"type": "string"},
            "total_amount": {"type": "number"},
            "currency": {
                "type": "string",
                "enum": ["USD", "EUR", "GBP", "CAD", "JPY"],
            },
            "line_items": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "description": {"type": "string"},
                        "quantity": {"type": "number"},
                        "unit_price": {"type": "number"},
                    },
                    "required": ["description", "quantity", "unit_price"],
                },
            },
        },
        "required": [
            "invoice_number", "date", "vendor_name", "total_amount",
            "currency", "line_items",
        ],
    },
}

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=2048,
    tools=[extract_tool],
    tool_choice={"type": "tool", "name": "store_invoice"},
    messages=[{"role": "user", "content": f"Extract this invoice:\n\n{invoice_text}"}],
)

tool_use_block = next(b for b in response.content if b.type == "tool_use")
print(tool_use_block.input)

Output:

text
{'invoice_number': 'INV-2025-0427', 'date': '2025-04-27', 'vendor_name': 'Acme Coffee',
 'total_amount': 42.50, 'currency': 'USD',
 'line_items': [{'description': 'Espresso', 'quantity': 2, 'unit_price': 4.50}, ...]}

The "tool" never has to actually do anything. It exists purely to define the schema and capture the output. Many teams name it store_X, record_X, or submit_X to signal that it's a sink rather than a callable.

Pydantic-paired validation

A typed validator turns the model's dict into an instance of a class — catching shape mismatches, type coercions, and missing fields with a clean traceback. Pydantic is the de-facto Python validator; the same pattern works with attrs, marshmallow, or dataclasses-json.

python
from pydantic import BaseModel, Field, ValidationError
from typing import Literal
from datetime import date

class LineItem(BaseModel):
    description: str
    quantity: float = Field(ge=0)
    unit_price: float = Field(ge=0)

class Invoice(BaseModel):
    invoice_number: str = Field(min_length=1)
    date: date
    vendor_name: str
    total_amount: float = Field(ge=0)
    currency: Literal["USD", "EUR", "GBP", "CAD", "JPY"]
    line_items: list[LineItem] = Field(min_length=1)

def extract_invoice(text: str) -> Invoice:
    response = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=2048,
        tools=[extract_tool],
        tool_choice={"type": "tool", "name": "store_invoice"},
        messages=[{"role": "user", "content": f"Extract this invoice:\n\n{text}"}],
    )
    tool_use = next(b for b in response.content if b.type == "tool_use")
    return Invoice.model_validate(tool_use.input)

Generate the tool schema FROM the Pydantic model using Invoice.model_json_schema() to keep them in sync. Pydantic's output is a valid JSON Schema; lightly trim fields like title and $defs that Claude doesn't need.

Generate schema from Pydantic

python
def pydantic_to_tool(model_class, tool_name: str, description: str) -> dict:
    schema = model_class.model_json_schema()
    # Inline $ref / $defs if present (simplified — production version handles nesting)
    return {
        "name": tool_name,
        "description": description,
        "input_schema": schema,
    }

invoice_tool = pydantic_to_tool(
    Invoice,
    tool_name="store_invoice",
    description="Store structured invoice data extracted from the document.",
)

TypeScript with Zod

The Zod equivalent of the Pydantic pattern. Zod schemas can be converted to JSON Schema via zod-to-json-schema and fed to Claude as a tool input schema.

typescript
import Anthropic from "@anthropic-ai/sdk";
import { z } from "zod";
import { zodToJsonSchema } from "zod-to-json-schema";

const LineItem = z.object({
  description: z.string(),
  quantity: z.number().nonnegative(),
  unit_price: z.number().nonnegative(),
});

const Invoice = z.object({
  invoice_number: z.string().min(1),
  date: z.string().regex(/^\d{4}-\d{2}-\d{2}$/),
  vendor_name: z.string(),
  total_amount: z.number().nonnegative(),
  currency: z.enum(["USD", "EUR", "GBP", "CAD", "JPY"]),
  line_items: z.array(LineItem).min(1),
});

const client = new Anthropic();

async function extractInvoice(text: string): Promise<z.infer<typeof Invoice>> {
  const response = await client.messages.create({
    model: "claude-opus-4-7",
    max_tokens: 2048,
    tools: [
      {
        name: "store_invoice",
        description: "Store structured invoice data.",
        input_schema: zodToJsonSchema(Invoice) as Anthropic.Tool.InputSchema,
      },
    ],
    tool_choice: { type: "tool", name: "store_invoice" },
    messages: [{ role: "user", content: `Extract this invoice:\n\n${text}` }],
  });

  const toolUse = response.content.find((b) => b.type === "tool_use");
  if (toolUse?.type !== "tool_use") throw new Error("No tool use in response");
  return Invoice.parse(toolUse.input);
}

JSON mode (prompt-only structuring)

When the API doesn't have a dedicated tool-use channel — or you want zero schema bookkeeping — instruct the model to return JSON and combine three techniques to maximize reliability: explicit schema in the prompt, prefill with {, and stop sequence after the closing brace.

python
SCHEMA_HINT = """
{
  "category": "billing" | "access" | "performance" | "bug" | "feature" | "other",
  "priority": "P1" | "P2" | "P3" | "P4",
  "summary": "<one sentence>",
  "needs_human": true | false
}
"""

def triage(ticket: str) -> dict:
    response = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=400,
        stop_sequences=["\n\n", "}\n"],
        messages=[
            {
                "role": "user",
                "content": (
                    f"Classify the support ticket. Output JSON matching this schema:\n"
                    f"{SCHEMA_HINT}\n"
                    f"Output ONLY the JSON object, no fences, no prose.\n\n"
                    f"Ticket: {ticket}"
                ),
            },
            {"role": "assistant", "content": "{"},   # prefill ensures it starts with JSON
        ],
    )
    raw = "{" + response.content[0].text
    if not raw.rstrip().endswith("}"):
        raw += "}"
    return json.loads(raw)

Prompt-only JSON is reliable on Opus and Sonnet but degrades on smaller models. If you can use tool use, you should. Reach for prompt-only JSON when you need cross-model portability or when adding tools is not worth the operational overhead.

Validator + retry loop

Even with tool use, validation occasionally fails — the model might omit a required field, return a number when you expected a string, or violate a regex pattern. The retry loop pattern catches the validation error and feeds it back as a corrective message. Cap at 2 retries; beyond that, fail loudly and queue for human review.

python
import json
import logging
from pydantic import ValidationError

def extract_with_validation(
    text: str,
    model_class: type[BaseModel],
    tool: dict,
    max_retries: int = 2,
) -> BaseModel | None:
    messages = [{"role": "user", "content": f"Extract:\n\n{text}"}]

    for attempt in range(max_retries + 1):
        resp = client.messages.create(
            model="claude-opus-4-7",
            max_tokens=2048,
            tools=[tool],
            tool_choice={"type": "tool", "name": tool["name"]},
            messages=messages,
        )
        tool_use = next((b for b in resp.content if b.type == "tool_use"), None)
        if not tool_use:
            return None

        try:
            return model_class.model_validate(tool_use.input)
        except ValidationError as e:
            logging.warning(f"Attempt {attempt + 1} failed: {e}")
            # Feed the error back and ask the model to retry
            messages.append({"role": "assistant", "content": resp.content})
            messages.append({
                "role": "user",
                "content": [{
                    "type": "tool_result",
                    "tool_use_id": tool_use.id,
                    "content": (
                        f"Validation failed: {e.errors()[:3]}. "
                        f"Call the tool again with corrected input."
                    ),
                    "is_error": True,
                }],
            })

    return None    # exhausted retries

Streaming structured output

When latency matters, stream the tool-call JSON as it's generated. The SDK exposes input_json_delta events with the partial JSON; combine them into a string and parse at message_stop.

python
import json

with client.messages.stream(
    model="claude-opus-4-7",
    max_tokens=2048,
    tools=[extract_tool],
    tool_choice={"type": "tool", "name": "store_invoice"},
    messages=[{"role": "user", "content": f"Extract: {text}"}],
) as stream:
    partial_json = ""
    for event in stream:
        if event.type == "content_block_delta" and event.delta.type == "input_json_delta":
            partial_json += event.delta.partial_json
            # Optionally: try to incrementally parse with a tolerant parser

    final = stream.get_final_message()
    tool_use = next(b for b in final.content if b.type == "tool_use")
    parsed = tool_use.input

print(parsed)

Use a tolerant streaming JSON parser (like partial-json on PyPI) if you need to render partial state to the user as the tool call streams in. Useful for live UI updates with progressive disclosure.

Schema design principles

The shape of your schema affects both LLM accuracy and downstream usability. Five rules that hold across providers:

sql
☐ Use enums for fixed value sets — eliminates hallucinated values
☐ Mark every truly-required field as `required` — Claude respects this
☐ Use descriptive field names, not abbreviations (currency_code, not ccy)
☐ Add a description to every field — Claude reads them; defaults are weak signals
☐ Prefer flat structures over deeply nested — easier to validate and debug
☐ Use null for "missing" rather than empty strings — distinguishes "not present" from ""
☐ Avoid free-text fields when an enum will do — extraction quality improves dramatically

Enum constraints

python
"properties": {
    "priority": {
        "type": "string",
        "enum": ["P1", "P2", "P3", "P4"],
        "description": "Severity: P1=outage, P2=blocked, P3=degraded, P4=question"
    },
    "sentiment": {
        "type": "string",
        "enum": ["angry", "frustrated", "neutral", "positive"],
        "description": "Tone of the customer"
    }
}

Discriminated unions

When a field can take several shapes, model it as a discriminated union — a type field that selects the shape of the rest. Claude understands these natively when the discriminator is an enum.

python
"properties": {
    "payment": {
        "type": "object",
        "properties": {
            "type": {"type": "string", "enum": ["card", "bank_transfer", "crypto"]},
            "last4": {"type": "string", "description": "Required if type=card"},
            "iban": {"type": "string", "description": "Required if type=bank_transfer"},
            "wallet_address": {"type": "string", "description": "Required if type=crypto"},
        },
        "required": ["type"]
    }
}

Optional vs required

python
"required": ["invoice_number", "date", "total_amount"]    # always present
# email, phone, address are NOT in required → may be omitted or null

If null is a valid value for an optional field, document it explicitly: "description": "Email if provided, else null". Claude will use null over making something up.

Open-weights constrained decoding

For models you run yourself (Llama, Qwen, DeepSeek), libraries like Outlines, Guidance, and LM Format Enforcer constrain generation at the token level — the model literally cannot emit a token that would break the schema. This gives 100% format conformance at the cost of slightly slower decoding.

python
# Outlines example (open-weights only)
from outlines import models, generate
from pydantic import BaseModel
from typing import Literal

class Ticket(BaseModel):
    category: Literal["billing", "access", "bug", "feature", "other"]
    priority: Literal["P1", "P2", "P3", "P4"]
    summary: str
    needs_human: bool

model = models.transformers("meta-llama/Llama-3-8B-Instruct")
generator = generate.json(model, Ticket)

result = generator("Customer reports: 'Page took 30 seconds to load.'")
print(result)

Output:

text
Ticket(category='bug', priority='P3', summary='Page load latency 30s', needs_human=False)

Constrained decoding does NOT replace prompt engineering — a poorly described task still produces semantically wrong outputs, just well-formatted ones. Use it as a guarantee on shape, not on correctness.

Nested schemas

Deeply nested schemas are harder for the model to populate correctly. Two mitigations: keep depth ≤ 3 levels where possible, and break large schemas into smaller calls when you can. A schema with 30+ leaf fields produces noticeably worse extraction than the same 30 fields split across 3 sequential calls.

python
# Heavy schema — works but error-prone
{
    "company": {
        "name": "...",
        "address": {
            "street": "...",
            "city": "...",
            "country": "...",
            "geo": {"lat": 0.0, "lng": 0.0}
        },
        "contacts": [{"name": "...", "role": "...", "email": "..."}]
    },
    "invoice": {...},
    "line_items": [...],
}

# Better — three sequential calls, each with a focused schema
company = extract_with_schema(text, CompanySchema)
invoice = extract_with_schema(text, InvoiceSchema)
items = extract_with_schema(text, LineItemsSchema)

Handling lists of unknown length

When the output is a list, include guidance on size bounds and what to do when nothing is found. Without it, models sometimes invent entries to fill perceived expectations.

python
"properties": {
    "tags": {
        "type": "array",
        "items": {"type": "string"},
        "description": "Topic tags from the article. Return empty array if no clear tags. Max 10."
    }
}

Error patterns and fixes

The repeated mistakes that show up when wiring up structured output, with the fix for each.

SymptomRoot causeFix
JSONDecodeError: Expecting valueModel wrapped JSON in ```fencesPrefill { + stop sequence on ```
Field is string when schema says numberSchema description vagueAdd "type": "number", "description": "Numeric. No commas, no $."
Empty array when items existSchema didn't say to extract themDescription: "Extract every line item; do not skip"
Extra fields not in schemaLLM padded with extrasPydantic model_config = ConfigDict(extra="forbid")
Required field missingLong input, model lost itRe-prompt with retry loop OR shorten input
Hallucinated enum valueFree-text field where enum would doConvert to enum
Numbers come back as stringsAuto-formattingPydantic does coercion; or strict mode + retry
Date format inconsistentNo format hintAdd "description": "ISO 8601 YYYY-MM-DD"
Nested object missing levelsSchema too deepFlatten or split into sequential calls
Different field order each runInsertion-order variesOrder shouldn't matter — use dict access, not list index

Common pitfalls

PitfallWhy it bitesFix
Trusting raw model output without validationField types drift silentlyAlways pipe through Pydantic / Zod
Validator and prompt schema out of syncEdits in one place, not the otherGenerate tool schema FROM the model class
No retry on validation failureOne bad parse breaks the pipelineOne retry with the error fed back is cheap insurance
extra="allow" in PydanticLLM-added fields leak throughextra="forbid" in production
Schema hidden in prompt as proseLLM has to re-derive structure each timeUse tool input_schema; cheaper, more reliable
Streaming partial JSON to userMid-stream parse errorsBuffer until stop event, then parse-and-render
Same schema for every model sizeSmaller models drop fieldsValidate against the WEAKEST model you support
Ignoring is_error flag in tool_resultRetries don't carry error signalSet is_error: true so model knows to fix

Real-world recipes

Compact end-to-end patterns for the highest-leverage structured-output use cases.

CRUD payload generation

python
class UserUpdate(BaseModel):
    email: str | None = None
    name: str | None = None
    plan: Literal["free", "pro", "enterprise"] | None = None
    notify_billing: bool | None = None

update_tool = pydantic_to_tool(
    UserUpdate,
    tool_name="propose_user_update",
    description="Generate a partial user-update payload from a natural-language command.",
)

def parse_user_command(text: str) -> UserUpdate:
    resp = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=600,
        tools=[update_tool],
        tool_choice={"type": "tool", "name": "propose_user_update"},
        messages=[{
            "role": "user",
            "content": (
                f"Convert this admin command into a user-update payload. "
                f"Set fields to null when not mentioned.\n\n{text}"
            ),
        }],
    )
    return UserUpdate.model_validate(
        next(b for b in resp.content if b.type == "tool_use").input
    )

payload = parse_user_command("Upgrade alice@example.com to Pro and turn off billing emails.")
# UserUpdate(email='alice@example.com', name=None, plan='pro', notify_billing=False)

Multi-entity extraction

python
class Person(BaseModel):
    name: str
    role: str | None = None
    affiliation: str | None = None

class Event(BaseModel):
    name: str
    date: str | None = None
    location: str | None = None

class Extraction(BaseModel):
    people: list[Person] = Field(default_factory=list)
    events: list[Event] = Field(default_factory=list)
    organizations: list[str] = Field(default_factory=list)

extract_tool = pydantic_to_tool(
    Extraction,
    tool_name="record_entities",
    description="Record all named entities found in the document.",
)

def extract_entities(text: str) -> Extraction:
    resp = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=2048,
        tools=[extract_tool],
        tool_choice={"type": "tool", "name": "record_entities"},
        messages=[{"role": "user", "content": text}],
    )
    return Extraction.model_validate(
        next(b for b in resp.content if b.type == "tool_use").input
    )

Function-call dispatch (router)

python
class WeatherArgs(BaseModel):
    location: str
    units: Literal["celsius", "fahrenheit"] = "celsius"

class StockArgs(BaseModel):
    ticker: str

class CalendarArgs(BaseModel):
    person: str
    day: str

ROUTES = {
    "get_weather": (WeatherArgs, get_weather),
    "get_stock_price": (StockArgs, get_stock_price),
    "check_calendar": (CalendarArgs, check_calendar),
}

def route(user_msg: str) -> Any:
    tools = [
        {"name": name, "description": fn.__doc__, "input_schema": cls.model_json_schema()}
        for name, (cls, fn) in ROUTES.items()
    ]
    resp = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=400,
        tools=tools,
        tool_choice={"type": "any"},        # must call exactly one tool
        messages=[{"role": "user", "content": user_msg}],
    )
    tool_use = next(b for b in resp.content if b.type == "tool_use")
    cls, fn = ROUTES[tool_use.name]
    args = cls.model_validate(tool_use.input)
    return fn(**args.model_dump())

Form-fill from messy text

python
class JobApplication(BaseModel):
    full_name: str
    email: str = Field(pattern=r"^[^@]+@[^@]+\.[^@]+$")
    years_experience: int = Field(ge=0, le=80)
    desired_salary_usd: int | None = None
    open_to_remote: bool

application_tool = pydantic_to_tool(
    JobApplication,
    tool_name="submit_application",
    description="Submit the parsed job application after extracting fields from the resume.",
)

def parse_application(resume_text: str) -> JobApplication:
    return extract_with_validation(resume_text, JobApplication, application_tool)

Quick reference

Pick the right tool for the job.

NeedFirst technique
Hard schema guarantee, ClaudeTool use + tool_choice={"type": "tool"}
Cross-model portabilityPrompt + prefill { + stop sequence
Open-weights model, must conformOutlines / Guidance / LM Format Enforcer
Lazy/optional fieldsOptional in Pydantic, NOT in required
Closed value setEnum in schema
Mutually exclusive shapesDiscriminated union (type field)
Inconsistent datesdescription: ISO 8601 YYYY-MM-DD
Catch bad outputs after generationPydantic / Zod + retry loop
Schema lives next to typespydantic_to_tool(MyModel, ...)
Streaming for UXinput_json_delta events + buffered parse
Pick exactly one of many actionsTools + tool_choice={"type": "any"}
Extract many entities at onceSingle tool with list[T] fields