cheat sheet
Structured Output
Techniques for reliable structured generation — JSON mode, schema-constrained decoding, function/tool calls as output, and validator pairing with Pydantic or Zod.
Structured Output
What it is
Structured output is the discipline of getting a language model to emit data in a machine-parseable shape — JSON, XML, a function-call signature — instead of free prose. It's the seam between an LLM and the rest of your application. Done badly, you sprinkle regex everywhere and field-rename incidents leak silent bugs. Done well, every model output passes through a typed validator (Pydantic, Zod, JSON Schema) and your application code never touches an Any. The four techniques in this article — schema-constrained tool calls, JSON mode, prompt-only structuring with prefill, and validator-paired retry loops — cover essentially every case in production.
The reliability spectrum
Different techniques offer different guarantees. Listed in order of increasing reliability — pick the lightest technique that meets your need.
| Technique | Schema guaranteed? | Format guaranteed? | Cost | Use when |
|---|---|---|---|---|
| Plain prose + regex | No | No | Low | One-off extraction, tolerate errors |
| Instruction "return JSON" | No | Mostly | Low | Prototype |
Prefill { + stop seq | No | Yes (parses as JSON) | Low | Lightweight, no schema needed |
| Tool use with input_schema | Yes (Claude) | Yes | Low | Production extraction |
| JSON mode (OpenAI/some APIs) | Yes (if schema set) | Yes | Low | OpenAI compatibility |
| Outlines / Guidance / LM Format | Yes | Yes | Medium | Open-weights; constrained decoding |
| Validator + retry loop | Eventually | Eventually | Medium | Safety net on any of the above |
For Claude, the most reliable schema-guaranteed approach is tool use with
tool_choice={"type": "tool", "name": "..."}. It returns a parsed Python dict matching theinput_schemaexactly. For everything else, prefer pairing your generation method with a typed validator and a one-shot retry.
Tool use as a structured output channel
Claude's tool-use API was designed for function calling, but it doubles as the most reliable structured-output mechanism. You define a tool whose input_schema IS your target output shape, force Claude to call it via tool_choice, and read the parsed input back from the response. No regex, no JSON parsing, no escape-string headaches.
import anthropic
client = anthropic.Anthropic()
extract_tool = {
"name": "store_invoice",
"description": "Store the structured invoice data extracted from the document.",
"input_schema": {
"type": "object",
"properties": {
"invoice_number": {"type": "string"},
"date": {"type": "string", "description": "ISO 8601 date"},
"vendor_name": {"type": "string"},
"total_amount": {"type": "number"},
"currency": {
"type": "string",
"enum": ["USD", "EUR", "GBP", "CAD", "JPY"],
},
"line_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": {"type": "string"},
"quantity": {"type": "number"},
"unit_price": {"type": "number"},
},
"required": ["description", "quantity", "unit_price"],
},
},
},
"required": [
"invoice_number", "date", "vendor_name", "total_amount",
"currency", "line_items",
],
},
}
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=2048,
tools=[extract_tool],
tool_choice={"type": "tool", "name": "store_invoice"},
messages=[{"role": "user", "content": f"Extract this invoice:\n\n{invoice_text}"}],
)
tool_use_block = next(b for b in response.content if b.type == "tool_use")
print(tool_use_block.input)
Output:
{'invoice_number': 'INV-2025-0427', 'date': '2025-04-27', 'vendor_name': 'Acme Coffee',
'total_amount': 42.50, 'currency': 'USD',
'line_items': [{'description': 'Espresso', 'quantity': 2, 'unit_price': 4.50}, ...]}
The "tool" never has to actually do anything. It exists purely to define the schema and capture the output. Many teams name it
store_X,record_X, orsubmit_Xto signal that it's a sink rather than a callable.
Pydantic-paired validation
A typed validator turns the model's dict into an instance of a class — catching shape mismatches, type coercions, and missing fields with a clean traceback. Pydantic is the de-facto Python validator; the same pattern works with attrs, marshmallow, or dataclasses-json.
from pydantic import BaseModel, Field, ValidationError
from typing import Literal
from datetime import date
class LineItem(BaseModel):
description: str
quantity: float = Field(ge=0)
unit_price: float = Field(ge=0)
class Invoice(BaseModel):
invoice_number: str = Field(min_length=1)
date: date
vendor_name: str
total_amount: float = Field(ge=0)
currency: Literal["USD", "EUR", "GBP", "CAD", "JPY"]
line_items: list[LineItem] = Field(min_length=1)
def extract_invoice(text: str) -> Invoice:
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=2048,
tools=[extract_tool],
tool_choice={"type": "tool", "name": "store_invoice"},
messages=[{"role": "user", "content": f"Extract this invoice:\n\n{text}"}],
)
tool_use = next(b for b in response.content if b.type == "tool_use")
return Invoice.model_validate(tool_use.input)
Generate the tool schema FROM the Pydantic model using
Invoice.model_json_schema()to keep them in sync. Pydantic's output is a valid JSON Schema; lightly trim fields liketitleand$defsthat Claude doesn't need.
Generate schema from Pydantic
def pydantic_to_tool(model_class, tool_name: str, description: str) -> dict:
schema = model_class.model_json_schema()
# Inline $ref / $defs if present (simplified — production version handles nesting)
return {
"name": tool_name,
"description": description,
"input_schema": schema,
}
invoice_tool = pydantic_to_tool(
Invoice,
tool_name="store_invoice",
description="Store structured invoice data extracted from the document.",
)
TypeScript with Zod
The Zod equivalent of the Pydantic pattern. Zod schemas can be converted to JSON Schema via zod-to-json-schema and fed to Claude as a tool input schema.
import Anthropic from "@anthropic-ai/sdk";
import { z } from "zod";
import { zodToJsonSchema } from "zod-to-json-schema";
const LineItem = z.object({
description: z.string(),
quantity: z.number().nonnegative(),
unit_price: z.number().nonnegative(),
});
const Invoice = z.object({
invoice_number: z.string().min(1),
date: z.string().regex(/^\d{4}-\d{2}-\d{2}$/),
vendor_name: z.string(),
total_amount: z.number().nonnegative(),
currency: z.enum(["USD", "EUR", "GBP", "CAD", "JPY"]),
line_items: z.array(LineItem).min(1),
});
const client = new Anthropic();
async function extractInvoice(text: string): Promise<z.infer<typeof Invoice>> {
const response = await client.messages.create({
model: "claude-opus-4-7",
max_tokens: 2048,
tools: [
{
name: "store_invoice",
description: "Store structured invoice data.",
input_schema: zodToJsonSchema(Invoice) as Anthropic.Tool.InputSchema,
},
],
tool_choice: { type: "tool", name: "store_invoice" },
messages: [{ role: "user", content: `Extract this invoice:\n\n${text}` }],
});
const toolUse = response.content.find((b) => b.type === "tool_use");
if (toolUse?.type !== "tool_use") throw new Error("No tool use in response");
return Invoice.parse(toolUse.input);
}
JSON mode (prompt-only structuring)
When the API doesn't have a dedicated tool-use channel — or you want zero schema bookkeeping — instruct the model to return JSON and combine three techniques to maximize reliability: explicit schema in the prompt, prefill with {, and stop sequence after the closing brace.
SCHEMA_HINT = """
{
"category": "billing" | "access" | "performance" | "bug" | "feature" | "other",
"priority": "P1" | "P2" | "P3" | "P4",
"summary": "<one sentence>",
"needs_human": true | false
}
"""
def triage(ticket: str) -> dict:
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=400,
stop_sequences=["\n\n", "}\n"],
messages=[
{
"role": "user",
"content": (
f"Classify the support ticket. Output JSON matching this schema:\n"
f"{SCHEMA_HINT}\n"
f"Output ONLY the JSON object, no fences, no prose.\n\n"
f"Ticket: {ticket}"
),
},
{"role": "assistant", "content": "{"}, # prefill ensures it starts with JSON
],
)
raw = "{" + response.content[0].text
if not raw.rstrip().endswith("}"):
raw += "}"
return json.loads(raw)
Prompt-only JSON is reliable on Opus and Sonnet but degrades on smaller models. If you can use tool use, you should. Reach for prompt-only JSON when you need cross-model portability or when adding tools is not worth the operational overhead.
Validator + retry loop
Even with tool use, validation occasionally fails — the model might omit a required field, return a number when you expected a string, or violate a regex pattern. The retry loop pattern catches the validation error and feeds it back as a corrective message. Cap at 2 retries; beyond that, fail loudly and queue for human review.
import json
import logging
from pydantic import ValidationError
def extract_with_validation(
text: str,
model_class: type[BaseModel],
tool: dict,
max_retries: int = 2,
) -> BaseModel | None:
messages = [{"role": "user", "content": f"Extract:\n\n{text}"}]
for attempt in range(max_retries + 1):
resp = client.messages.create(
model="claude-opus-4-7",
max_tokens=2048,
tools=[tool],
tool_choice={"type": "tool", "name": tool["name"]},
messages=messages,
)
tool_use = next((b for b in resp.content if b.type == "tool_use"), None)
if not tool_use:
return None
try:
return model_class.model_validate(tool_use.input)
except ValidationError as e:
logging.warning(f"Attempt {attempt + 1} failed: {e}")
# Feed the error back and ask the model to retry
messages.append({"role": "assistant", "content": resp.content})
messages.append({
"role": "user",
"content": [{
"type": "tool_result",
"tool_use_id": tool_use.id,
"content": (
f"Validation failed: {e.errors()[:3]}. "
f"Call the tool again with corrected input."
),
"is_error": True,
}],
})
return None # exhausted retries
Streaming structured output
When latency matters, stream the tool-call JSON as it's generated. The SDK exposes input_json_delta events with the partial JSON; combine them into a string and parse at message_stop.
import json
with client.messages.stream(
model="claude-opus-4-7",
max_tokens=2048,
tools=[extract_tool],
tool_choice={"type": "tool", "name": "store_invoice"},
messages=[{"role": "user", "content": f"Extract: {text}"}],
) as stream:
partial_json = ""
for event in stream:
if event.type == "content_block_delta" and event.delta.type == "input_json_delta":
partial_json += event.delta.partial_json
# Optionally: try to incrementally parse with a tolerant parser
final = stream.get_final_message()
tool_use = next(b for b in final.content if b.type == "tool_use")
parsed = tool_use.input
print(parsed)
Use a tolerant streaming JSON parser (like
partial-jsonon PyPI) if you need to render partial state to the user as the tool call streams in. Useful for live UI updates with progressive disclosure.
Schema design principles
The shape of your schema affects both LLM accuracy and downstream usability. Five rules that hold across providers:
☐ Use enums for fixed value sets — eliminates hallucinated values
☐ Mark every truly-required field as `required` — Claude respects this
☐ Use descriptive field names, not abbreviations (currency_code, not ccy)
☐ Add a description to every field — Claude reads them; defaults are weak signals
☐ Prefer flat structures over deeply nested — easier to validate and debug
☐ Use null for "missing" rather than empty strings — distinguishes "not present" from ""
☐ Avoid free-text fields when an enum will do — extraction quality improves dramatically
Enum constraints
"properties": {
"priority": {
"type": "string",
"enum": ["P1", "P2", "P3", "P4"],
"description": "Severity: P1=outage, P2=blocked, P3=degraded, P4=question"
},
"sentiment": {
"type": "string",
"enum": ["angry", "frustrated", "neutral", "positive"],
"description": "Tone of the customer"
}
}
Discriminated unions
When a field can take several shapes, model it as a discriminated union — a type field that selects the shape of the rest. Claude understands these natively when the discriminator is an enum.
"properties": {
"payment": {
"type": "object",
"properties": {
"type": {"type": "string", "enum": ["card", "bank_transfer", "crypto"]},
"last4": {"type": "string", "description": "Required if type=card"},
"iban": {"type": "string", "description": "Required if type=bank_transfer"},
"wallet_address": {"type": "string", "description": "Required if type=crypto"},
},
"required": ["type"]
}
}
Optional vs required
"required": ["invoice_number", "date", "total_amount"] # always present
# email, phone, address are NOT in required → may be omitted or null
If
nullis a valid value for an optional field, document it explicitly:"description": "Email if provided, else null". Claude will usenullover making something up.
Open-weights constrained decoding
For models you run yourself (Llama, Qwen, DeepSeek), libraries like Outlines, Guidance, and LM Format Enforcer constrain generation at the token level — the model literally cannot emit a token that would break the schema. This gives 100% format conformance at the cost of slightly slower decoding.
# Outlines example (open-weights only)
from outlines import models, generate
from pydantic import BaseModel
from typing import Literal
class Ticket(BaseModel):
category: Literal["billing", "access", "bug", "feature", "other"]
priority: Literal["P1", "P2", "P3", "P4"]
summary: str
needs_human: bool
model = models.transformers("meta-llama/Llama-3-8B-Instruct")
generator = generate.json(model, Ticket)
result = generator("Customer reports: 'Page took 30 seconds to load.'")
print(result)
Output:
Ticket(category='bug', priority='P3', summary='Page load latency 30s', needs_human=False)
Constrained decoding does NOT replace prompt engineering — a poorly described task still produces semantically wrong outputs, just well-formatted ones. Use it as a guarantee on shape, not on correctness.
Nested schemas
Deeply nested schemas are harder for the model to populate correctly. Two mitigations: keep depth ≤ 3 levels where possible, and break large schemas into smaller calls when you can. A schema with 30+ leaf fields produces noticeably worse extraction than the same 30 fields split across 3 sequential calls.
# Heavy schema — works but error-prone
{
"company": {
"name": "...",
"address": {
"street": "...",
"city": "...",
"country": "...",
"geo": {"lat": 0.0, "lng": 0.0}
},
"contacts": [{"name": "...", "role": "...", "email": "..."}]
},
"invoice": {...},
"line_items": [...],
}
# Better — three sequential calls, each with a focused schema
company = extract_with_schema(text, CompanySchema)
invoice = extract_with_schema(text, InvoiceSchema)
items = extract_with_schema(text, LineItemsSchema)
Handling lists of unknown length
When the output is a list, include guidance on size bounds and what to do when nothing is found. Without it, models sometimes invent entries to fill perceived expectations.
"properties": {
"tags": {
"type": "array",
"items": {"type": "string"},
"description": "Topic tags from the article. Return empty array if no clear tags. Max 10."
}
}
Error patterns and fixes
The repeated mistakes that show up when wiring up structured output, with the fix for each.
| Symptom | Root cause | Fix |
|---|---|---|
JSONDecodeError: Expecting value | Model wrapped JSON in ```fences | Prefill { + stop sequence on ``` |
| Field is string when schema says number | Schema description vague | Add "type": "number", "description": "Numeric. No commas, no $." |
| Empty array when items exist | Schema didn't say to extract them | Description: "Extract every line item; do not skip" |
| Extra fields not in schema | LLM padded with extras | Pydantic model_config = ConfigDict(extra="forbid") |
| Required field missing | Long input, model lost it | Re-prompt with retry loop OR shorten input |
| Hallucinated enum value | Free-text field where enum would do | Convert to enum |
| Numbers come back as strings | Auto-formatting | Pydantic does coercion; or strict mode + retry |
| Date format inconsistent | No format hint | Add "description": "ISO 8601 YYYY-MM-DD" |
| Nested object missing levels | Schema too deep | Flatten or split into sequential calls |
| Different field order each run | Insertion-order varies | Order shouldn't matter — use dict access, not list index |
Common pitfalls
| Pitfall | Why it bites | Fix |
|---|---|---|
| Trusting raw model output without validation | Field types drift silently | Always pipe through Pydantic / Zod |
| Validator and prompt schema out of sync | Edits in one place, not the other | Generate tool schema FROM the model class |
| No retry on validation failure | One bad parse breaks the pipeline | One retry with the error fed back is cheap insurance |
extra="allow" in Pydantic | LLM-added fields leak through | extra="forbid" in production |
| Schema hidden in prompt as prose | LLM has to re-derive structure each time | Use tool input_schema; cheaper, more reliable |
| Streaming partial JSON to user | Mid-stream parse errors | Buffer until stop event, then parse-and-render |
| Same schema for every model size | Smaller models drop fields | Validate against the WEAKEST model you support |
Ignoring is_error flag in tool_result | Retries don't carry error signal | Set is_error: true so model knows to fix |
Real-world recipes
Compact end-to-end patterns for the highest-leverage structured-output use cases.
CRUD payload generation
class UserUpdate(BaseModel):
email: str | None = None
name: str | None = None
plan: Literal["free", "pro", "enterprise"] | None = None
notify_billing: bool | None = None
update_tool = pydantic_to_tool(
UserUpdate,
tool_name="propose_user_update",
description="Generate a partial user-update payload from a natural-language command.",
)
def parse_user_command(text: str) -> UserUpdate:
resp = client.messages.create(
model="claude-opus-4-7",
max_tokens=600,
tools=[update_tool],
tool_choice={"type": "tool", "name": "propose_user_update"},
messages=[{
"role": "user",
"content": (
f"Convert this admin command into a user-update payload. "
f"Set fields to null when not mentioned.\n\n{text}"
),
}],
)
return UserUpdate.model_validate(
next(b for b in resp.content if b.type == "tool_use").input
)
payload = parse_user_command("Upgrade alice@example.com to Pro and turn off billing emails.")
# UserUpdate(email='alice@example.com', name=None, plan='pro', notify_billing=False)
Multi-entity extraction
class Person(BaseModel):
name: str
role: str | None = None
affiliation: str | None = None
class Event(BaseModel):
name: str
date: str | None = None
location: str | None = None
class Extraction(BaseModel):
people: list[Person] = Field(default_factory=list)
events: list[Event] = Field(default_factory=list)
organizations: list[str] = Field(default_factory=list)
extract_tool = pydantic_to_tool(
Extraction,
tool_name="record_entities",
description="Record all named entities found in the document.",
)
def extract_entities(text: str) -> Extraction:
resp = client.messages.create(
model="claude-opus-4-7",
max_tokens=2048,
tools=[extract_tool],
tool_choice={"type": "tool", "name": "record_entities"},
messages=[{"role": "user", "content": text}],
)
return Extraction.model_validate(
next(b for b in resp.content if b.type == "tool_use").input
)
Function-call dispatch (router)
class WeatherArgs(BaseModel):
location: str
units: Literal["celsius", "fahrenheit"] = "celsius"
class StockArgs(BaseModel):
ticker: str
class CalendarArgs(BaseModel):
person: str
day: str
ROUTES = {
"get_weather": (WeatherArgs, get_weather),
"get_stock_price": (StockArgs, get_stock_price),
"check_calendar": (CalendarArgs, check_calendar),
}
def route(user_msg: str) -> Any:
tools = [
{"name": name, "description": fn.__doc__, "input_schema": cls.model_json_schema()}
for name, (cls, fn) in ROUTES.items()
]
resp = client.messages.create(
model="claude-opus-4-7",
max_tokens=400,
tools=tools,
tool_choice={"type": "any"}, # must call exactly one tool
messages=[{"role": "user", "content": user_msg}],
)
tool_use = next(b for b in resp.content if b.type == "tool_use")
cls, fn = ROUTES[tool_use.name]
args = cls.model_validate(tool_use.input)
return fn(**args.model_dump())
Form-fill from messy text
class JobApplication(BaseModel):
full_name: str
email: str = Field(pattern=r"^[^@]+@[^@]+\.[^@]+$")
years_experience: int = Field(ge=0, le=80)
desired_salary_usd: int | None = None
open_to_remote: bool
application_tool = pydantic_to_tool(
JobApplication,
tool_name="submit_application",
description="Submit the parsed job application after extracting fields from the resume.",
)
def parse_application(resume_text: str) -> JobApplication:
return extract_with_validation(resume_text, JobApplication, application_tool)
Quick reference
Pick the right tool for the job.
| Need | First technique |
|---|---|
| Hard schema guarantee, Claude | Tool use + tool_choice={"type": "tool"} |
| Cross-model portability | Prompt + prefill { + stop sequence |
| Open-weights model, must conform | Outlines / Guidance / LM Format Enforcer |
| Lazy/optional fields | Optional in Pydantic, NOT in required |
| Closed value set | Enum in schema |
| Mutually exclusive shapes | Discriminated union (type field) |
| Inconsistent dates | description: ISO 8601 YYYY-MM-DD |
| Catch bad outputs after generation | Pydantic / Zod + retry loop |
| Schema lives next to types | pydantic_to_tool(MyModel, ...) |
| Streaming for UX | input_json_delta events + buffered parse |
| Pick exactly one of many actions | Tools + tool_choice={"type": "any"} |
| Extract many entities at once | Single tool with list[T] fields |