第 3 章 · Structured Output · 类型即输出契约

一、为什么 Structured Output 是 Agent 的分水岭

2023 年之前,LLM 和业务代码的接口只有一种:字符串。开发者在两边都要做脏活——prompt 里啰嗦"请用 JSON 返回,字段叫 X Y Z",代码里战战兢兢 json.loads + try/except。一个不稳定接口,把大家都逼成了"prompt 魔法师"。

2024 年起,三件事凑齐了:

OpenAI 推出 response_format={"type": "json_schema"}——服务端保证输出合 schema
Anthropic / Gemini 把 tool use 做稳——你让 LLM "调一个叫 final_result 的工具",它就会严格按参数 schema 输出
Pydantic 的 JSON Schema 导出在 v2 变得无懈可击——你的 Python 类型,能无损转成 JSON Schema

Pydantic AI 的 structured output 正是把这三件事拼起来:你写个 Pydantic 模型,它自动转 schema 发给 LLM,收回数据后用 Pydantic 做最后一道校验。结构化输出不再是 hack,是一等公民。

二、output_type 能吃什么

不是只能塞 BaseModel——Pydantic AI 对所有 Pydantic 认识的类型都支持。完整清单:

类型	示例	何时用
`str`	`output_type=str`	默认值,不要结构化,要原文
`BaseModel`	`output_type=Weather`	最常用,可嵌套
`dataclass`	`@dataclass class X: ...`	不想引 Pydantic 的 BaseModel 时
`TypedDict`	`class X(TypedDict): ...`	要保持"一个普通 dict"的手感
基础容器	`list[int]`, `dict[str, float]`	扁平结构
`Literal`	`Literal["a","b","c"]`	分类任务,限定取值
`Union` / `\|`	`Weather \| ErrorResult`	多种可能的输出形态
原生基础类型	`int`, `bool`, `float`	只要一个值

三、BaseModel:最主流的写法

from pydantic import BaseModel, Field
from pydantic_ai import Agent

class Weather(BaseModel):
    """某城市当前天气。"""
    city: str = Field(description="城市名,中英文均可")
    temp_c: float = Field(description="摄氏度", ge=-60, le=60)
    condition: str = Field(description="晴/多云/雨/雪等")
    humidity: int | None = Field(default=None, description="相对湿度 0-100,不知道填 null")

agent = Agent("openai:gpt-4o", output_type=Weather)

result = agent.run_sync("告诉我北京现在大致的天气")
print(result.output)
# city='北京' temp_c=14.0 condition='多云' humidity=55

注意这几个细节:

BaseModel 的 docstring 会作为 schema 的 description,LLM 看到
Field(description=...) 会塞给 LLM 作为字段语义说明——写清楚这里比你在 prompt 里啰嗦 3 行有用
ge=-60, le=60 这些校验会转成 JSON Schema 的 minimum/maximum——模型看得到,Pydantic 也会本地二次校验
Optional:Python 3.10+ 建议写 int | None,不要写 Optional[int](虽然都能用)

四、嵌套:别写得太深

class Book(BaseModel):
    title: str
    year: int

class Author(BaseModel):
    name: str
    books: list[Book]       # 嵌套 list[BaseModel]
    alive: bool

agent = Agent("openai:gpt-4o", output_type=Author)
result = agent.run_sync("给我一个虚构的中国作家和他的三本书")

print(result.output.name)
for b in result.output.books:
    print("  -", b.title, b.year)

嵌套层数的经验值 LLM 对深度 > 3 层的嵌套结构生成质量会明显下降(字段错漏、层级错位)。能拆平就拆平;实在要嵌套,给每层的 docstring / field description 写清楚。

五、Literal + Union:分类和"多种可能结果"

分类任务

from typing import Literal
from pydantic import BaseModel

class SentimentResult(BaseModel):
    label: Literal["positive", "neutral", "negative"]
    confidence: float = Field(ge=0, le=1)
    reason: str

agent = Agent("openai:gpt-4o-mini", output_type=SentimentResult)
result = agent.run_sync("判断情感:这部电影简直是灾难,我要退票")
# label='negative' confidence=0.95 reason='...'

Literal 在 JSON Schema 里表现为 enum——LLM 被强制只能选其中之一。这是分类任务的标配。

"多种可能结果":Union / `|`

现实世界 Agent 不一定每次都能给出你期望的结果。比如"查订单":订单存在→返回 Order;订单不存在→返回 NotFound;权限不足→返回 Denied。用 Union 表达:

from pydantic import BaseModel
from typing import Literal

class Order(BaseModel):
    kind: Literal["order"] = "order"
    order_id: int
    total: float
    status: Literal["paid", "shipped", "delivered"]

class NotFound(BaseModel):
    kind: Literal["not_found"] = "not_found"
    tried_id: int

class Denied(BaseModel):
    kind: Literal["denied"] = "denied"
    reason: str

agent = Agent("openai:gpt-4o", output_type=Order | NotFound | Denied)

result = agent.run_sync("帮我查订单 9527 的状态,我的 user_id 是 100")

match result.output:
    case Order(order_id=oid, total=t):
        print(f"订单 {oid},金额 {t}")
    case NotFound(tried_id=tid):
        print(f"订单 {tid} 不存在")
    case Denied(reason=r):
        print(f"被拒: {r}")

Python 的 match/case 用来消费 Union 结果简直完美。每个分支的 Discriminator(kind) 都是 Literal,Pydantic 能根据它自动辨别返回的是哪一种。

六、TypedDict 和 dataclass:不想用 BaseModel 的选择

from typing import TypedDict
from pydantic_ai import Agent

class StockQuote(TypedDict):
    symbol: str
    price: float
    change_pct: float

agent = Agent("openai:gpt-4o", output_type=StockQuote)
result = agent.run_sync("苹果现在股价大概多少")
print(result.output["symbol"])   # 就是个 dict 的用法

from dataclasses import dataclass

@dataclass
class Summary:
    tldr: str
    bullet_points: list[str]

agent = Agent("openai:gpt-4o", output_type=Summary)
result = agent.run_sync("总结:Python 的 GIL 机制...")
print(result.output.tldr)

用 Field(description=...) 的能力 TypedDict/dataclass 都弱一些(dataclass 有 metadata={"description": ...},不如 Pydantic 直观)。除非你有特殊原因,生产项目推荐 BaseModel。

七、校验失败自动重试:背后的机制

这是 Pydantic AI 最"贴心"的一环。流程是:

┌────────────────────────────┐ │ 1. LLM 输出 JSON │ └──────────────┬─────────────┘ ▼ ┌────────────────────────────┐ │ 2. Pydantic 校验 │ ├────────────────────────────┤ │ ✅ 通过 → return output │ │ ❌ 失败 → 取错误文本 │ └──────────────┬─────────────┘ ▼ ┌────────────────────────────┐ │ 3. ModelRetry 消息 │ │ "校验失败: [errors]" │ → 作为 ToolReturn 发回 LLM └──────────────┬─────────────┘ ▼ ┌────────────────────────────┐ │ 4. LLM 看着错误再输出一次 │ └──────────────┬─────────────┘ ▼ ↻ 最多 retries 次

你根本不用手动 try/except。实测下,gpt-4o 级别的模型在第一次就过的概率 > 95%;第二次几乎 100%。

class Age(BaseModel):
    years: int = Field(ge=0, le=150)

agent = Agent("openai:gpt-4o-mini", output_type=Age, retries=3)

# 想让它故意出错:让它回"八十八岁"——不是 int
result = agent.run_sync("用中文大写数字回答:爷爷今年八十八岁")
print(result.output)   # years=88  ← 自动被修回来
print(result.usage())  # 你会看到 requests=2,第二次才通过

八、手动抛 ModelRetry:业务校验失败也能重试

有时候 Pydantic 本身没错,是业务逻辑错了。比如模型返回了一个不存在的 user_id,你要让它重选。这时可以在 output_validator 里抛 ModelRetry:

from pydantic_ai import Agent, ModelRetry, RunContext

class UserPick(BaseModel):
    user_id: int
    reason: str

agent = Agent("openai:gpt-4o", output_type=UserPick, retries=2)

@agent.output_validator
def must_be_real_user(ctx: RunContext[None], output: UserPick) -> UserPick:
    if output.user_id not in EXISTING_USERS:
        raise ModelRetry(f"user_id {output.user_id} 不存在,请重选一个有效的。")
    return output

result = agent.run_sync("从用户里选一个最活跃的,返回 id 和理由")

output_validator 可以注册多个,按注册顺序执行——抛 ModelRetry 就走重试;抛其他异常就原样冒出来。

九、PromptedOutput / NativeOutput / TextOutput:三种输出模式

Pydantic AI 背后其实有三种"让 LLM 吐结构化"的机制,默认会自动选。有时你需要显式指定:

ToolOutput(默认)

让 LLM "调一个叫 final_result 的工具"。最兼容——所有支持 function calling 的 provider 都能跑。

NativeOutput

走 provider 的原生 structured output 接口(OpenAI 的 json_schema / Anthropic 的 tool_choice:any)。对 schema 约束最严。

PromptedOutput

纯 prompt 工程——把 schema 塞 system prompt 里,让模型自己吐 JSON。用于没有 function calling 能力的小模型/老模型。

from pydantic_ai import Agent
from pydantic_ai.output import NativeOutput, PromptedOutput, ToolOutput

# 显式用 NativeOutput——OpenAI gpt-4o 推荐
agent = Agent(
    "openai:gpt-4o",
    output_type=NativeOutput(Weather),
)

# 显式用 PromptedOutput——跑本地小模型时可能更稳
agent = Agent(
    "ollama:qwen2.5",
    output_type=PromptedOutput(Weather),
)

什么时候需要手动选 ① 用 Ollama 跑小模型,function calling 不稳 → 试 PromptedOutput。 ② 用 OpenAI 且要最严格校验 → 用 NativeOutput。 ③ 其他场景,交给默认的 ToolOutput。

十、和 function calling 的关系:同一件事的两种视角

一个灵魂问题:既然 structured output 底层用的也是 function calling,那"工具"和"输出"本质不是一回事吗?

答案:是。只不过 Pydantic AI 把"最后一步输出"专门叫 final_result 工具,约定它触发后 Agent 运行结束。你在 all_messages() 里能亲眼看到这个工具调用:

result = agent.run_sync("...")

print(result.all_messages())
# ModelResponse(parts=[ToolCallPart(tool_name='final_result', args=...)])

这也是为什么很多概念(自动重试、参数校验、schema 生成)在 Tool(第 4 章)和 Output 之间是共享的——它们就是同一套机制的两种用法。

十一、一个贴近真实业务的完整示例

场景:解析用户的自由文本订单请求,产出结构化订单,带校验和可能的"拒绝"分支。

from typing import Literal
from pydantic import BaseModel, Field
from pydantic_ai import Agent, ModelRetry, RunContext

class OrderItem(BaseModel):
    sku: str = Field(description="商品 SKU,格式如 A-123", pattern=r"^[A-Z]-\d+$")
    qty: int = Field(ge=1, le=999)

class ParsedOrder(BaseModel):
    """从自然语言中解析出的结构化订单。"""
    kind: Literal["order"] = "order"
    items: list[OrderItem] = Field(min_length=1, description="至少一个商品")
    ship_to: str = Field(description="收货地址,必须包含城市")
    notes: str | None = None

class Unparseable(BaseModel):
    """信息不足以形成订单。"""
    kind: Literal["unparseable"] = "unparseable"
    missing_fields: list[str]
    hint_for_user: str

agent = Agent(
    "openai:gpt-4o",
    output_type=ParsedOrder | Unparseable,
    system_prompt=(
        "你是一个订单解析器。"
        "如果用户提供的信息不足以形成订单,必须返回 Unparseable,"
        "并在 missing_fields 中列出缺失字段。"
    ),
    retries=2,
)

@agent.output_validator
def check_sku_exists(ctx: RunContext[None], output):
    if isinstance(output, ParsedOrder):
        for item in output.items:
            if item.sku not in SKU_CATALOG:
                raise ModelRetry(f"SKU {item.sku} 不在目录中,请核对。")
    return output

result = agent.run_sync("我要 2 个 A-123 和 1 个 B-456,寄到北京朝阳区")

match result.output:
    case ParsedOrder(items=its, ship_to=addr):
        print(f"下单 {len(its)} 件,收货 {addr}")
    case Unparseable(missing_fields=mf, hint_for_user=h):
        print(f"无法解析,缺少: {mf};提示: {h}")

通读一遍,你会发现我们在做非常真实的业务:字段约束(pattern、min_length)、业务校验(SKU 存在)、失败回退(Unparseable 分支)——全都在类型层面。这是 LangChain / OpenAI SDK 原生写法难以优雅达成的。

十二、八个常见坑

过度嵌套:深度 > 3 层质量明显下降,拆平或分阶段解析。
Field 的 description 写得太随意:这是给 LLM 的提示,不是给自己的注释,写清楚字段语义和取值规则。
忘记 Union 的 kind discriminator:不加 Literal 也能跑,但 Pydantic 判别分支会靠字段结构,慢且脆。建议每个分支带 kind: Literal["..."] = "..."。
用 Optional[X] 配不写默认值:field: Optional[int] 但没 = None,Pydantic 认为必填,LLM 还是得给值。要可选请写 field: int | None = None。
约束互相矛盾:Field(ge=10, le=5) 这种 schema LLM 也不知道怎么满足,自己看清楚。
不设 retries 就上线:默认 retries=1,实际生产建议 2-3,给模型多一次修正机会。
在 output_validator 里做慢操作:会在每次 run 重试时都执行一次,慢操作要做缓存或提前执行。
用 PromptedOutput 强校验:它本质是 prompt 约束,LLM 跑偏的概率比 NativeOutput/ToolOutput 高得多,别用于强一致性场景。

十三、本章小结

四句话带走:
① Structured output 让 LLM 的输出和你的 Python 代码说同一种语言——类型。
② output_type 吃 str / BaseModel / dataclass / TypedDict / Literal / Union / list / dict / 基础类型,够用 95% 的场景。
③ 校验失败时 Pydantic AI 自动把错误原文让模型再改一次,配合 ModelRetry 还能做业务层校验。
④ 需要严格 schema 就 NativeOutput,跑本地小模型就 PromptedOutput,其他场景默认的 ToolOutput 够用。

一、为什么 Structured Output 是 Agent 的分水岭

二、output_type 能吃什么

三、BaseModel:最主流的写法

四、嵌套:别写得太深

五、Literal + Union:分类和"多种可能结果"

分类任务

"多种可能结果":Union / |

六、TypedDict 和 dataclass:不想用 BaseModel 的选择

七、校验失败自动重试:背后的机制

八、手动抛 ModelRetry:业务校验失败也能重试

九、PromptedOutput / NativeOutput / TextOutput:三种输出模式

十、和 function calling 的关系:同一件事的两种视角

十一、一个贴近真实业务的完整示例

十二、八个常见坑

十三、本章小结

"多种可能结果":Union / `|`