第4章内容集合：Markdown/MDX 管理

内容集合（Content Collections）

为什么需要内容集合？

在 Astro 2.0 之前，读取 Markdown 文件的方式是用 Astro.glob()，但这种方式没有类型安全：你不知道每篇文章有哪些 frontmatter 字段，字段拼错了只有运行时才会发现，IDE 也无法提示。

内容集合（Content Collections）解决了这个问题：用 Zod 定义内容的数据模式（schema），TypeScript 就能提供完整的类型安全和 IDE 自动补全。Zod schema 在构建时校验每个文件的 frontmatter，字段不符合规范则直接报错，拒绝生成页面。

核心概念

Content Collection（内容集合）

src/content/ 下的一个子目录，代表同类型内容的集合，如 blog/、docs/、products/ 等。每个集合有独立的 Zod schema，所有文件必须符合该 schema。

type: 'content'（内容类型）

用于 .md/.mdx 文件，有 body（Markdown 正文）、render() 方法和 slug 属性。这是博客文章、文档页面等有正文内容的标准类型。

type: 'data'（数据类型）

用于 .json/.yaml 文件，只有结构化数据，没有正文。适合存储配置（导航菜单、作者列表、翻译词条等）。

Zod Schema（类型模式）

使用 Zod 库定义 frontmatter 的数据结构和验证规则。Astro 内置 Zod，通过 import { z } from 'astro:content' 引入，无需单独安装。

loader（内容加载器）

Astro 5 引入的 Content Layer API 核心概念，用于从任意来源加载内容。内置 glob() 加载器读取本地文件，也可以自定义 loader 从 CMS API、数据库等拉取数据。

CollectionEntry<T>（集合条目类型）

TypeScript 类型，表示集合中的单个条目。包含 id（文件路径）、data（符合 schema 的 frontmatter 数据）、body（原始文本）、render()（type: 'content' 专有的渲染方法）。

src/content/ 目录结构

src/content/
├── config.ts              # 集合 schema 定义（必须）
├── blog/                  # 博客文章集合（type: 'content'）
│   ├── hello-world.md
│   ├── astro-guide.mdx    # MDX 格式（可嵌入 React/Vue 组件）
│   └── draft-post.md
├── docs/                  # 文档集合
│   ├── getting-started.md
│   └── api-reference.md
└── authors/               # 作者数据集合（type: 'data'）
    ├── alice.json
    └── bob.yaml

config.ts 必须在 src/content/ 根目录

config.ts（或 config.js）必须直接放在 src/content/ 目录下，不能放在子目录中。如果文件不存在，Astro 不会对集合文件做类型校验，所有 frontmatter 类型将退化为 Record<string, any>。

config.ts：定义 Zod Schema

// src/content/config.ts
import { defineCollection, reference, z } from 'astro:content';

const blogCollection = defineCollection({
  type: 'content',  // 'content'（Markdown/MDX）或 'data'（JSON/YAML）
  schema: z.object({
    title: z.string(),                // 必填字符串
    pubDate: z.date(),                 // 日期类型（frontmatter 中写 2024-01-15）
    description: z.string(),
    author: z.string().default('匿名'),  // 有默认值，可不填
    tags: z.array(z.string()).default([]),  // 字符串数组
    image: z.object({
      url: z.string(),
      alt: z.string(),
    }).optional(),              // 可选对象
    draft: z.boolean().default(false),
    relatedPosts: z.array(reference('blog')).optional(),  // 跨集合引用
  }),
});

// 作者数据集合（JSON/YAML 文件）
const authorsCollection = defineCollection({
  type: 'data',       // 数据类型，无正文
  schema: z.object({
    name: z.string(),
    bio: z.string(),
    avatar: z.string().url(),   // Zod 内置 URL 格式校验
    social: z.object({
      twitter: z.string().optional(),
      github: z.string().optional(),
    }).optional(),
  }),
});

// 导出所有集合：键名必须与 src/content/ 下的子目录名匹配
export const collections = {
  blog: blogCollection,
  authors: authorsCollection,
};

Markdown 文件 frontmatter

---
title: "Astro 内容集合完全指南"
pubDate: 2024-03-15
description: "深入理解 Astro 内容集合的类型安全与查询 API"
author: "alice"
tags: ["Astro", "TypeScript", "教程"]
image:
  url: "/images/content-collections.jpg"
  alt: "内容集合架构示意图"
draft: false
relatedPosts:
  - astro-guide        # 引用 blog/astro-guide.md（不含扩展名）
---

# 文章内容从这里开始

这是文章正文，支持完整的 Markdown 语法。

## 二级标题

- 列表项 1
- 列表项 2

```javascript
console.log('代码块也支持语法高亮');
```

常见错误：pubDate 的类型

Zod 的 z.date() 校验的是 JavaScript Date 对象，但 YAML frontmatter 中的日期字符串（如 2024-01-15）会被 YAML 解析器自动转换为 Date 对象——这是正确的用法。

常见错误是在 frontmatter 中写成带引号的字符串：pubDate: "2024-01-15"。加了引号就变成字符串，Zod 校验会报错 Expected date, received string。去掉引号即可：pubDate: 2024-01-15。

查询内容集合

getCollection — 获取整个集合

---
import { getCollection, getEntry } from 'astro:content';

// 获取整个集合（所有文章）
const allPosts = await getCollection('blog');

// 过滤：只取非草稿、已发布的文章
const publishedPosts = await getCollection('blog', ({ data }) => {
  return !data.draft;
});

// 排序：按发布日期倒序
publishedPosts.sort((a, b) => b.data.pubDate.getTime() - a.data.pubDate.getTime());

// 按标签过滤
const astroPosts = await getCollection('blog', ({ data }) =>
  data.tags.includes('Astro')
);
---

<ul>
  {publishedPosts.map(post => (
    <li>
      <a href={`/blog/${post.slug}`}>
        {post.data.title}
      </a>
      <!-- post.data 类型已自动推断，编辑器有完整自动补全 -->
      <time>{post.data.pubDate.toLocaleDateString('zh-CN')}</time>
      <span>{post.data.tags.join(', ')}</span>
    </li>
  ))}
</ul>

getEntry — 获取单个条目

---
import { getEntry } from 'astro:content';

// 方法1：通过集合名 + 文件名（不含扩展名）获取
const post = await getEntry('blog', 'hello-world');

// 方法2：通过引用对象获取（常用于跨集合引用）
const author = await getEntry(post.data.author);  // 如果 author 是 reference('authors')

// 方法3：当文章有 relatedPosts 引用时，批量解析引用
const relatedPosts = await getEntries(post.data.relatedPosts ?? []);

// getEntry 返回 null（找不到时），注意处理 null
if (!post) {
  return Astro.redirect('/404');
}
---

渲染文章内容

---
// src/pages/blog/[slug].astro
import { getCollection, getEntry } from 'astro:content';
import BlogLayout from '../../layouts/BlogLayout.astro';
import type { CollectionEntry } from 'astro:content';

// getStaticPaths 告诉 Astro 要预生成哪些页面
export async function getStaticPaths() {
  const posts = await getCollection('blog', ({ data }) => !data.draft);
  return posts.map(post => ({
    params: { slug: post.slug },  // 对应 [slug].astro
    props: { post },              // 传递整个 post 对象给页面
  }));
}

// TypeScript 类型：从 props 拿到 post
interface Props {
  post: CollectionEntry<'blog'>;
}
const { post } = Astro.props;

// render() 将 Markdown/MDX 编译为 HTML
// Content：渲染正文内容的组件
// headings：文章中所有标题的元数据（用于生成目录）
const { Content, headings } = await post.render();
---

<BlogLayout
  title={post.data.title}
  description={post.data.description}
>
  <!-- 渲染文章目录 -->
  <nav>
    {headings.map(h => (
      <a href={`#${h.slug}`} style={`padding-left: ${(h.depth - 2) * 1}rem`}>
        {h.text}
      </a>
    ))}
  </nav>

  <h1>{post.data.title}</h1>
  <time>{post.data.pubDate.toLocaleDateString('zh-CN')}</time>

  <!-- Content 组件渲染 Markdown/MDX 正文 -->
  <Content />
</BlogLayout>

跨集合引用（reference()）

在博客文章中引用作者信息是一个典型的跨集合引用场景。使用 reference() 函数，Astro 会在构建时验证引用的条目是否存在。

// config.ts：声明引用关系
const blogCollection = defineCollection({
  type: 'content',
  schema: z.object({
    title: z.string(),
    // author 字段引用 authors 集合中的某个条目
    author: reference('authors'),
    // relatedPosts 是一个数组，引用 blog 集合中的多个条目
    relatedPosts: z.array(reference('blog')).optional(),
  }),
});

---
// [slug].astro — 解析跨集合引用
import { getEntry, getEntries } from 'astro:content';

const { post } = Astro.props;

// 解析 author 引用 → 获取完整的 author 数据
const author = await getEntry(post.data.author);
// author.data.name, author.data.bio 等字段全部有类型安全

// 解析多个 relatedPosts 引用
const related = await getEntries(post.data.relatedPosts ?? []);
---

<p>作者：{author.data.name}</p>
<img src={author.data.avatar} alt={author.data.name} />

<aside>
  <h3>相关文章</h3>
  {related.map(p => <a href={`/blog/${p.slug}`}>{p.data.title}</a>)}
</aside>

引用的目标必须存在

如果 frontmatter 中的 reference 字段指向不存在的条目（如 author: non-existent-author），Astro 在构建时会报错：Could not find entry for collection 'authors' with id 'non-existent-author'。这是有意为之的安全检查，防止悬空引用。

MDX：在 Markdown 中使用组件

MDX（.mdx）是 Markdown 的超集，允许在正文中嵌入 React/Vue/Astro 组件，适合需要交互式内容的场景（如带运行按钮的代码示例、嵌入式演示）。

# 安装 MDX 支持
npx astro add mdx

---
title: "交互式教程"
pubDate: 2024-03-01
---

# 普通 Markdown 文字

import CodeDemo from '../../components/CodeDemo.jsx';
import Callout from '../../components/Callout.astro';

<Callout type="info">
  这是一个 Astro 组件，直接在 MDX 中使用！
</Callout>

<CodeDemo lang="javascript">
  console.log('这个代码可以在浏览器中运行')
</CodeDemo>

## 继续写 Markdown...

正文可以和组件混用。

MDX 组件的 Props 传递

在 MDX 文件中，可以通过 export 向 Astro 模板暴露数据，也可以通过向 render() 传参的方式把 Astro 组件注入到 MDX 中渲染指定的 Markdown 元素：

const { Content } = await post.render();
// 替换 MDX 中的 h2 标签为自定义组件
<Content components={{ h2: MyHeading }} />

Astro 5 Content Layer API

从本地文件到任意数据源

Astro 5 引入了 Content Layer API，将"内容集合"的概念从本地文件系统解放出来——通过 loader 函数，可以从任何来源加载内容，同时保持相同的类型安全查询接口（getCollection/getEntry）。

// src/content/config.ts（Astro 5 风格）
import { defineCollection, z } from 'astro:content';
import { glob, file } from 'astro/loaders';

// 方法1：glob() 加载器 — 读取本地文件（Astro 4 的默认行为）
const blog = defineCollection({
  loader: glob({ pattern: '**/*.md', base: './src/data/blog' }),
  schema: z.object({
    title: z.string(),
    pubDate: z.date(),
    description: z.string(),
  }),
});

// 方法2：file() 加载器 — 从单个 JSON/YAML 文件加载多条数据
const countries = defineCollection({
  loader: file('src/data/countries.json'),
  schema: z.object({
    code: z.string(),
    name: z.string(),
    capital: z.string(),
  }),
});

// 方法3：自定义 loader — 从远程 API 加载内容
const products = defineCollection({
  loader: async () => {
    // 在构建时从 CMS API 拉取数据
    const response = await fetch('https://api.myshop.com/products');
    const data = await response.json();
    // 必须返回带有 id 字段的数组
    return data.products.map((p: any) => ({
      id: p.sku,        // id 字段用于 getEntry('products', id) 查询
      ...p,
    }));
  },
  schema: z.object({
    name: z.string(),
    price: z.number(),
    inStock: z.boolean(),
  }),
});

export const collections = { blog, countries, products };

Content Layer API 与 Astro 4 的区别

Astro 4（type: 'content'/'data'）

内容必须来自 src/content/ 目录下的本地文件。通过 type 字段区分有正文（.md/.mdx）和纯数据（.json/.yaml）。

Astro 5（loader 函数）

通过 loader 函数指定数据来源，可以是本地文件系统（glob/file 内置 loader）、HTTP API、数据库等任意来源。文件位置不再限于 src/content/。

向下兼容性

Astro 5 仍然支持旧的 type: 'content'/'data' 语法。使用 loader 时，旧的 slug 属性换成了 id。迁移现有项目时需要将 post.slug 替换为 post.id。

Astro 5 中 slug vs id

Astro 5 的 Content Layer API（使用 loader）中，条目标识符从 slug 改为 id。post.slug 仅在使用旧式 type: 'content' 时存在，切换到 loader 后需要改用 post.id。这是 Astro 5 最主要的破坏性变更之一。

查询进阶：类型推断与过滤模式

// 在 TypeScript 工具函数中使用集合类型
import { getCollection } from 'astro:content';
import type { CollectionEntry } from 'astro:content';

// 带过滤的辅助函数（类型推断正常工作）
async function getPublishedPosts(): Promise<CollectionEntry<'blog'>[]> {
  const posts = await getCollection('blog');
  return posts
    .filter(p => !p.data.draft)
    .sort((a, b) =>
      b.data.pubDate.getTime() - a.data.pubDate.getTime()
    );
}

// 按标签分组（用于标签页面）
async function getPostsByTag(): Promise<Map<string, CollectionEntry<'blog'>[]>> {
  const posts = await getPublishedPosts();
  const tagMap = new Map<string, CollectionEntry<'blog'>[]>();
  for (const post of posts) {
    for (const tag of post.data.tags) {
      if (!tagMap.has(tag)) tagMap.set(tag, []);
      tagMap.get(tag)!.push(post);
    }
  }
  return tagMap;
}

本章小结

本章核心要点

类型安全是核心价值：Content Collections + Zod schema 让 frontmatter 字段在编辑器中有类型提示和编译时检查；相比 Astro.glob() 方案，能在构建阶段而非运行时发现数据错误。
type 'content' vs 'data'：'content' 用于 .md/.mdx 有正文的文件，提供 render()、headings、slug；'data' 用于 .json/.yaml 纯数据文件，适合作者信息、导航配置等结构化数据。
reference() 实现关联查询：声明跨集合引用关系（blog.author → authors 集合），Astro 构建时验证引用有效性，用 getEntry(entry.data.refField) 解析引用，所有数据都有完整类型推断。
Astro 5 Content Layer API：通过 loader 函数替代 type 字段，解锁任意数据源（本地文件 / CMS API / 数据库）；内置 glob() 和 file() 加载器；条目标识符从 slug 改为 id。
pubDate 的正确写法：YAML frontmatter 中日期不加引号（pubDate: 2024-01-15），YAML 自动解析为 Date 对象；加引号变成字符串会导致 Zod 验证失败。
MDX 为静态内容增加交互：在 .mdx 文件中 import 并使用 React/Vue/Astro 组件；render() 返回的 Content 组件支持 components prop，可以替换 Markdown 元素的默认渲染组件。

← 上一章 .astro 组件语法详解下一章 → 集成其他框架：React/Vue/Svelte