第6章 Memory — Mastra 完全指南

三层记忆模型

Thread(对话线程)

单条对话的完整消息历史。按 threadId 隔离,resourceId 做用户归属。短期、线性,最常用。

Semantic Recall(语义召回)

跨 thread 的语义检索。当前用户问话 embedding 后去历史消息里找相关片段注入 prompt。解决"上周说过的事这周还记得"。

Working Memory(工作记忆)

长期键值画像。LLM 自己决定写入/更新(姓名、喜好、当前目标等),每次对话自动注入。像 Agent 的"个人笔记本"。

最小配置

import { Memory } from '@mastra/memory';
import { LibSQLStore, LibSQLVector } from '@mastra/libsql';
import { openai } from '@ai-sdk/openai';

const memory = new Memory({
  storage: new LibSQLStore({ url: 'file:./memory.db' }),
  vector: new LibSQLVector({ connectionUrl: 'file:./memory.db' }),
  embedder: openai.embedding('text-embedding-3-small'),
  options: {
    lastMessages: 20,
    semanticRecall: { topK: 3, messageRange: 2 },
    workingMemory: { enabled: true },
  },
});

export const agent = new Agent({
  name: 'assistant',
  instructions: '...',
  model: openai('gpt-4o-mini'),
  memory,
});

调用时指定 thread 与 resource

await agent.generate('还记得我上次说的项目吗?', {
  threadId: 'thread-42',    // 对话线程
  resourceId: 'user-007',   // 用户或租户 id
});

threadId vs resourceId
一个 resource(用户)可以有多个 thread(对话)。Semantic Recall 默认在 resource 范围内跨 thread 搜索;Working Memory 按 resource 粒度长期保存。

Thread 消息管理

// 手动创建线程
const thread = await memory.createThread({
  threadId: 'thread-42',
  resourceId: 'user-007',
  title: '讨论迁移方案',
});

// 追加消息
await memory.addMessage({
  threadId: 'thread-42',
  role: 'user',
  content: '我们上次聊到哪了',
});

// 查询历史
const msgs = await memory.getMessages({
  threadId: 'thread-42',
  last: 50,
});

Semantic Recall 参数

options: {
  lastMessages: 20,         // 最近 N 条原样注入
  semanticRecall: {
    topK: 3,               // 检索返回 3 条最相关
    messageRange: 2,       // 每条命中附近±2条,凑上下文
    scope: 'resource',     // 'thread' 仅当前对话 / 'resource' 跨线程
  },
}

当前 user 输入会被 embedder 向量化后去 vector store 搜。命中消息连同前后各 messageRange 条塞进 prompt,Agent 自然地"想起来"那段对话。

Working Memory:让 Agent 记住用户画像

options: {
  workingMemory: {
    enabled: true,
    template: `# 用户画像
- 姓名:
- 职业:
- 沟通偏好:
- 当前目标:
- 待办:
  - [ ]
`,
  },
},

Mastra 给 Agent 注入一段 system prompt:「以下是你维护的用户笔记(Markdown),每轮对话结束前,若有新信息请更新对应字段」。LLM 通过结构化 tool call 修改 note,下次对话自动回显。

// 用 Zod schema 代替自由 Markdown,结构更可控
import { z } from 'zod';

workingMemory: {
  enabled: true,
  schema: z.object({
    name: z.string().optional(),
    occupation: z.string().optional(),
    preferences: z.array(z.string()).default([]),
    tasks: z.array(z.object({
      title: z.string(),
      done: z.boolean().default(false),
    })).default([]),
  }),
}

Working Memory vs RAG

维度	Working Memory	RAG(第7章)
数据来源	对话中 LLM 自动提炼	外部文档、知识库
规模	几 KB,用户画像级	GB 级文档
写入	LLM 决定何时更新	工程师预先 ingest
检索	每次对话整段注入	按 query embedding 检索
适合	个性化、偏好、长期目标	企业知识、客观事实

Storage Provider

@mastra/libsql

SQLite 兼容,本地 file: 或 Turso 云端。存储+向量一体,开发首选。

@mastra/pg

PostgreSQL + pgvector。生产推荐,事务、索引、备份都成熟。

@mastra/upstash

Serverless Redis + Upstash Vector。适合 Cloudflare/Vercel Edge 部署。

@mastra/cloudflare

D1 作存储,Vectorize 作向量。完整跑在 Cloudflare Workers。

生产 Postgres 配置

import { PostgresStore, PgVector } from '@mastra/pg';

const memory = new Memory({
  storage: new PostgresStore({
    connectionString: process.env.DATABASE_URL!,
  }),
  vector: new PgVector({
    connectionString: process.env.DATABASE_URL!,
  }),
  embedder: openai.embedding('text-embedding-3-small'),
});

首次启动会在你的数据库里创建 mastra_messages、mastra_threads、mastra_working_memory、mastra_embeddings 等表,自带迁移。

Token 成本控制

lastMessages 别超 40:线性注入,消息多 token 爆
semanticRecall.topK ≤ 5:主要依赖召回质量而非数量
workingMemory 用 schema 不用 Markdown:结构化注入更省 token
大项目开多 thread:一个 thread 对应一个话题,减少跨话题干扰

调试:看 Agent 到底"记住了什么"

Playground 的 Memory 面板会实时显示:

当前 thread 的最近 N 条消息
本轮 semantic recall 命中的片段(带原 threadId 链接)
working memory 的当前快照
最终注入模型的 system prompt 合成结果

本章小结

    三层记忆:Thread(线性对话)+ Semantic Recall(跨线程召回)+ Working Memory(用户画像)
threadId + resourceId 做隔离,scope 控制召回边界
storage 存消息/元数据,vector 存 embedding,embedder 做向量化
Working Memory 用 Zod schema 结构化,比 Markdown 更可控
libsql 本地开发、pg 生产、Upstash/D1 serverless 部署

  

Memory:三层记忆让 Agent 有状态

三层记忆模型

最小配置

调用时指定 thread 与 resource

Thread 消息管理

Semantic Recall 参数

Working Memory:让 Agent 记住用户画像

Working Memory vs RAG

Storage Provider

生产 Postgres 配置

Token 成本控制

调试:看 Agent 到底"记住了什么"

本章小结