为什么要流式
- 体感延迟:首 token 往往 < 500ms,用户立刻知道"在工作"
- 可中断:用户反悔点停止,服务器立刻断,省 token
- 长输出可行:非流式要等完整 30K tokens 生成完才返回,易超时
SSE 事件协议
Claude 流式走标准的 SSE(Server-Sent Events)——HTTP 长连接,每个事件形如:
event: message_start
data: {"type":"message_start","message":{"id":"msg_...","model":"...","usage":{...}}}
event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"你"}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"好"}}
event: content_block_stop
data: {"type":"content_block_stop","index":0}
event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":64}}
event: message_stop
data: {"type":"message_stop"}
事件类型速查
message_start
消息元信息,
usage.input_tokens 在这里已经知道content_block_start
某个 block 开始(text / tool_use / thinking),
index 标识是第几块content_block_delta
增量:
text_delta / input_json_delta(工具调用参数)/ thinking_deltacontent_block_stop
某个 block 结束。如果下一条是另一个 block_start,说明还有内容
message_delta
最终
stop_reason 和 usage.output_tokens 在这里message_stop
整条消息结束
ping
保活心跳,忽略即可
error
流中途错误,里面有 error.type 和 message
Node.js:async iterator
SDK 让你用标准 for await 循环,事件自动解析:
import Anthropic from "@anthropic-ai/sdk"; const client = new Anthropic(); const stream = client.messages.stream({ model: "claude-sonnet-4-6", max_tokens: 2048, messages: [{ role: "user", content: "讲 tcp 三次握手" }], }); for await (const event of stream) { if (event.type === "content_block_delta" && event.delta.type === "text_delta") { process.stdout.write(event.delta.text); } } // 或更高级别的便利事件 stream.on("text", (text) => process.stdout.write(text)); stream.on("finalMessage", (msg) => console.log("\n完整:", msg));
Python:with 语法
import anthropic client = anthropic.Anthropic() with client.messages.stream( model="claude-sonnet-4-6", max_tokens=2048, messages=[{"role": "user", "content": "讲 tcp 三次握手"}], ) as stream: for text in stream.text_stream: print(text, end="", flush=True) final = stream.get_final_message() print("\nusage:", final.usage)
with 退出时会自动关闭底层连接——别不用 with,否则忘关很容易漏资源。
用户点停止:优雅中断
前端一般用 AbortController 控制。Node SDK 支持原生 signal:
const ac = new AbortController(); // 用户点 stop 按钮 cancelBtn.addEventListener("click", () => ac.abort()); const stream = client.messages.stream({ model: "claude-sonnet-4-6", max_tokens: 4096, messages, }, { signal: ac.signal }); try { for await (const ev of stream) { /* ... */ } } catch (e) { if (e.name === "AbortError") console.log("用户取消了"); else throw e; }
取消后底层 HTTP 连接立刻断,Anthropic 不会继续生成——你不付剩下的 output token。
用 fetch 手动解析(不用 SDK)
前端不带 SDK、或者自定义代理想看原始格式:
const resp = await fetch("https://api.anthropic.com/v1/messages", { method: "POST", headers: { "x-api-key": key, "anthropic-version": "2023-06-01", "content-type": "application/json", }, body: JSON.stringify({ model: "claude-sonnet-4-6", max_tokens: 1024, stream: true, messages: [{ role: "user", content: "hi" }], }), }); const reader = resp.body!.pipeThrough(new TextDecoderStream()).getReader(); let buf = ""; while (true) { const { done, value } = await reader.read(); if (done) break; buf += value; let idx; while ((idx = buf.indexOf("\n\n")) !== -1) { const chunk = buf.slice(0, idx); buf = buf.slice(idx + 2); const m = chunk.match(/^data: (.*)$/m); if (!m) continue; const ev = JSON.parse(m[1]); if (ev.type === "content_block_delta") console.log(ev.delta.text); } }
记得 anthropic-version
直接 fetch 一定要带
直接 fetch 一定要带
anthropic-version: 2023-06-01 请求头,漏了直接 400。SDK 已经自动带上。
前端 → 服务器 → Claude 的透传
前端不能直接调 Anthropic(会暴露 Key)。典型架构:
[Browser] → POST /api/chat (SSE) → [Your Server] → Anthropic (SSE)
│
└─ 透传事件 + 注入你自己的业务逻辑
实现要点:
- 服务器端用
client.messages.stream()拿到 stream - 设 response header:
Content-Type: text/event-stream for await每个事件,转发给前端(可选过滤/改写)- 前端用浏览器原生
EventSource或fetch + reader消费
Thinking 流式细节
开 Extended Thinking(第 7 章)时,流里会先出 thinking 类型 block —— 这部分是 Claude 的内部推理,通常不展示给用户,但 API 会流出来以提高响应感。
content_block_start { type: "thinking" }
content_block_delta { delta: { type: "thinking_delta", thinking: "..." } }
content_block_stop
content_block_start { type: "text" }
content_block_delta { delta: { type: "text_delta", text: "最终答案" } }
...
应用层一般:thinking 显示为"正在思考..."指示器,真正的 text block 才打到聊天气泡里。
Tool Use 流式
Claude 调工具时,input_json_delta 是工具参数的分片(字符串片段,累加才成合法 JSON):
content_block_start { type: "tool_use", id: "toolu_...", name: "get_weather", input: {} }
content_block_delta { delta: { type: "input_json_delta", partial_json: "{\"lo" } }
content_block_delta { delta: { type: "input_json_delta", partial_json: "c\":\"" } }
content_block_delta { delta: { type: "input_json_delta", partial_json: "Tokyo\"}" } }
content_block_stop
SDK 帮你累加 —— stream.on("inputJson", ...) 回调整段合法 JSON。
超时 & 断线重连
生产建议:
- 客户端 HTTP 超时至少 120s(Claude Opus 长输出可能 > 60s)
- 流式场景靠 ping 事件保活,服务端代理默认 60s 空闲超时要调大
- 断了不能"续接",只能重发整个请求 —— 流没有断点续传机制
事件可视化 demo
[0.2s] message_start input_tokens=18
[0.3s] content_block_start index=0 type=text
[0.4s] content_block_delta "TCP"
[0.5s] content_block_delta " 三次握手"
[0.6s] content_block_delta "是..."
... (每约 20ms 一个 delta)
[8.1s] content_block_stop index=0
[8.1s] message_delta stop_reason=end_turn, output_tokens=520
[8.2s] message_stop
首 token 延迟 ≈ 200–400ms(Sonnet),后续大约每秒 80–200 tokens。
本章小结
- 流式 = SSE,七种事件类型,
content_block_delta最常用 - Node SDK 的 async iterator + 便利事件(
text/finalMessage);Python 用with+text_stream - AbortController/signal 优雅中断,立刻省下剩余 output tokens
- 前端不能直连 Anthropic,服务器做透传
- Thinking 和 Tool Use 都有独立的 block 类型,处理方式不同