BoxLang AI 深度解析：第 6 部分——内存系统与 RAG

内存是区分有用 AI 应用与玩具的关键。BoxLang AI 提供了 AI 框架中最全面的内存系统之一——涵盖两大类、20 多种内存类型、支持语义检索的向量嵌入、用于 RAG 流水线的 30 多种文档加载器，以及一套使多租户应用默认安全的“基于调用的身份路由系统”。

本文将带您全面了解这些功能。

🧠 两种内存类别

           +-----------------------------------+
           |         BoxLang AI 内存           |
           +-----------------------------------+
                        /           \
                       /             \
                      v               v

+--------------------------------+   +--------------------------------+
|          标准内存              |   |          向量内存              |
+--------------------------------+   +--------------------------------+
| 存储对话历史                   |   | 存储语义知识                   |
| 顺序消息线程                   |   | 嵌入 + 检索                    |
| 按时间/顺序检索                |   | 按语义含义检索                 |
| 例如：记住之前的基本事实       |   | 例如：RAG 知识查询             |
+--------------------------------+   +--------------------------------+

                      \               /
                       \             /
                        v           v

         +-------------------------------------------+
         | 共享抽象与使用模型                        |
         +-------------------------------------------+
         | IAiMemory 接口                            |
         | aiMemory() BIF                            |
         | 基于调用的身份路由                        |
         | 两者之间应用代码改动极小                  |
         +-------------------------------------------+

BoxLang AI 内存分为两个截然不同的类别，分别解决两类问题。

标准内存存储对话历史——即用户与助手之间的顺序消息。它让代理能够记住“我叫 Luis”这类三条消息前的信息。

向量内存存储语义知识——即文档、历史对话或领域内容的嵌入，可以通过含义（而非时间顺序）进行检索。它实现了 RAG 功能：“为当前查询从知识库中找到最相关的三段话。”

这两类内存共享相同的 IAiMemory 接口、相同的 aiMemory() BIF 以及相同的身份路由机制，因此您的应用代码在两者之间几乎不需要改动。

📋 标准内存类型

使用我们的全局函数即可创建任何内存：aiMemory( type, config: {} )。默认内存类型是包含 20 条消息的 window 内存：

$ node

// Window 内存 — 保留最近的 N 条消息
mem = aiMemory( "window", config: { maxMessages: 20 } )

// Summary 内存 — 自动总结旧消息以保留上下文
mem = aiMemory( "summary", config: {
    maxMessages      : 30,
    summaryThreshold : 15,
    summaryModel     : "gpt-4o-mini"
} )

// Cache 内存 — 基于 CacheBox，支持分布式
mem = aiMemory( "cache", config: { cacheName: "aiMemory" } )

// Session 内存 — 作用域限定于当前 Web 会话
mem = aiMemory( "session" )

// File 内存 — 持久化到磁盘，用于审计追踪
mem = aiMemory( "file", config: { filePath: "/logs/conversations/" } )

// JDBC 内存 — 存储在数据库中，适用于企业级多用户场景
mem = aiMemory( "jdbc", config: {
    datasource : "myDB",
    table      : "ai_conversations"
} )

类型	适用场景
`window`	快速聊天、注重成本的应用、无状态 API
`summary`	需要在消息限制内保留上下文的长对话
`session`	结合 PHP/BoxLang 会话的多页面 Web 应用
`file`	审计追踪、离线检查、长期存储
`cache`	分布式应用、多服务器部署
`jdbc`	企业级多用户系统、完全持久化

摘要内存 —— 原理解析

summary 类型值得特别关注。当消息数量超过 summaryThreshold 时，它会调用配置的 LLM 对最旧的消息进行单段落总结，用该总结替换掉旧消息作为一条系统消息，然后继续累积。这样既保留了对话上下文，又避免了因存储完整历史记录而产生的 Token 成本。

$ node

agent = aiAgent(
    name   : "support-bot",
    memory : aiMemory( "summary", config: {
        maxMessages      : 40,    // 最多保留 40 条消息
        summaryThreshold : 20,    // 达到 20 条时进行总结
        summaryModel     : "gpt-4o-mini"  // 使用低成本模型进行总结
    } )
)

🔍 向量内存类型

向量内存存储嵌入并根据语义相似度进行检索——当“寻找相关上下文”比“回顾刚才说了什么”更重要时，这是最佳工具。

$ node

// 内存向量 — 开发和小型数据集
mem = aiMemory( "boxvector" )

// ChromaDB — 基于 Python 的向量存储
mem = aiMemory( "chroma", config: {
    collection       : "support_docs",
    embeddingProvider: "openai",
    embeddingModel   : "text-embedding-3-small"
} )

// PostgreSQL pgvector — 与现有的 Postgres 配合使用
mem = aiMemory( "postgres", config: {
    datasource       : "myDB",
    table            : "ai_embeddings",
    embeddingProvider: "openai"
} )

// Pinecone — 托管云向量数据库
mem = aiMemory( "pinecone", config: {
    apiKey     : "${Setting: PINECONE_API_KEY not found}",
    index      : "knowledge-base",
    namespace  : "support"
} )

// OpenSearch — AWS OpenSearch 或自托管
mem = aiMemory( "opensearch", config: {
    host             : "https://my-opensearch:9200",
    index            : "ai_embeddings",
    embeddingProvider: "openai"
} )

完整向量内存列表：

类型	描述
`boxvector`	内存中，用于开发/测试
`hybrid`	最近窗口 + 语义检索结合
`chroma`	ChromaDB 集成
`postgres`	PostgreSQL pgvector
`mysql`	MySQL 9 原生向量
`opensearch`	OpenSearch 集成
`typesense`	快速且容错的搜索
`pinecone`	托管云向量数据库
`qdrant`	高性能向量存储
`weaviate`	GraphQL 向量数据库
`milvus`	企业级向量数据库

混合内存 —— 兼具二者之长

hybrid 结合了最近消息窗口和语义向量检索，让您同时拥有“时效性”和“相关性”：

$ node

mem = aiMemory( "hybrid", config: {
    recentLimit   : 5,        // 始终保留最近 5 条消息
    semanticLimit : 5,        // 添加 5 条语义相关的历史消息
    vectorProvider: "chroma"  // 由 ChromaDB 提供支持
} )

对于大多数生产环境的客服机器人或助手场景，hybrid 是最佳选择——既有用于保持连贯性的近期上下文，又有用于提供深度的语义检索。

🏢 基于调用的多租户身份路由

这是使 BoxLang AI 内存具有可扩展性的架构特性。内存实例是无状态的，可安全地作为单例使用——userId 和 conversationId 会将每个操作路由到正确的隔离对话中。您也可以使用特定的标识符来创建内存，以实现特定代理的专属记忆。

每个内存操作都接受可选的身份参数：

$ node

sharedMemory = aiMemory( "cache" )

// 操作是完全租户隔离的
sharedMemory.add( message, userId: "alice", conversationId: "sess-1" )
sharedMemory.add( message, userId: "bob",   conversationId: "sess-2" )

// 检索具有作用域限制 — Alice 永远看不到 Bob 的消息
aliceHistory = sharedMemory.getAll( userId: "alice", conversationId: "sess-1" )
bobHistory   = sharedMemory.getAll( userId: "bob",   conversationId: "sess-2" )

// 仅清除 Alice 的对话
sharedMemory.clear( userId: "alice", conversationId: "sess-1" )

在实践中，您可以通过 AiAgent.run() 选项传递身份信息，它会自动流向所有内存操作：

$ node

sharedAgent = aiAgent( name: "support", memory: sharedMemory )

// 一个 Agent 实例，服务多个并发用户 — 完全安全
sharedAgent.run( "你好，我需要订单方面的帮助", {}, { userId: "alice", conversationId: "sess-1" } )
sharedAgent.run( "我刚才问了什么？", {}, { userId: "alice", conversationId: "sess-1" } ) // 记得住
sharedAgent.run( "能帮我重置密码吗？", {}, { userId: "bob",   conversationId: "sess-2" } ) // 隔离

无需针对每个用户创建 Agent 工厂，无需线程局部变量（Thread-local）黑科技，也无需担心共享状态的并发 Bug。一个实例，服务多个租户。

📚 文档加载器

文档加载器是 RAG 流水线的摄入层。它们将 30 多种源类型的内容标准化为向量内存能理解的 Document 格式。

$ node

// 加载单个 PDF
docs = aiDocuments(
    source : "/path/to/product-manual.pdf",
    config : { type: "pdf" }
).load()

// 递归加载目录中的所有 Markdown 文件
docs = aiDocuments(
    source : "/knowledge-base",
    config : {
        type       : "directory",
        recursive  : true,
        extensions : [ "md", "txt", "pdf" ]
    }
).load()

// 加载实时网页
docs = aiDocuments(
    source : "https://boxlang.ortusbooks.com/getting-started/overview",
    config : { type: "http" }
).load()

// 从数据库查询加载
docs = aiDocuments(
    source : "SELECT title, content FROM articles WHERE published = 1",
    config : { type: "sql", datasource: "myDB" }
).load()

// 爬取整个网站
docs = aiDocuments(
    source : "https://docs.mycompany.com",
    config : {
        type     : "webcrawler",
        maxPages : 200,
        delay    : 500
    }
).load()

内置加载器：

加载器	类型	处理内容
`TextLoader`	`text`	`.txt, .log`
`MarkdownLoader`	`markdown`	带标题拆分的 `.md`
`HTMLLoader`	`html`	网页，去除脚本/样式
`CSVLoader`	`csv`	将行作为文档，支持列过滤
`JSONLoader`	`json`	字段提取，数组转文档
`PDFLoader`	`pdf`	多页，支持页面范围选择
`XMLLoader`	`xml`	结构化 XML 内容
`LogLoader`	`log`	应用日志文件
`HTTPLoader`	`http`	单 URL 获取
`FeedLoader`	`feed`	RSS / Atom 订阅源
`SQLLoader`	`sql`	数据库查询结果
`DirectoryLoader`	`directory`	批量文件处理
`WebCrawlerLoader`	`webcrawler`	多页爬取

🔗 构建完整的 RAG 流水线

以下是完整蓝图——将文档摄入向量内存，然后使用配置了该内存的 Agent，根据您的内容回答问题。

第一步：摄入

$ node

// 创建由 ChromaDB 支持的向量内存
vectorMemory = aiMemory( "chroma", config: {
    collection       : "company_knowledge",
    embeddingProvider: "openai",
    embeddingModel   : "text-embedding-3-small"
} )

// 一次调用完成所有摄入
result = aiDocuments(
    source : "/knowledge-base",
    config : {
        type       : "directory",
        recursive  : true,
        extensions : [ "md", "txt", "pdf" ]
    }
).toMemory(
    memory  : vectorMemory,
    options : { chunkSize: 1000, overlap: 200 }
)

// 丰富的摄入报告
println( "文档已加载 : #result.documentsIn#" )
println( "创建分块   : #result.chunksOut#" )
println( "已存储向量 : #result.stored#" )
println( "跳过重复   : #result.deduped#" )
println( "预估成本   : $#result.estimatedCost#" )

toMemory() 方法通过 aiChunk() 处理分块，通过配置的提供程序处理嵌入、去重和存储——所有操作都在一次流畅的调用中完成，并返回详细报告。

第二步：查询

$ node

// 使用相同向量内存的 Agent —— 自动检索相关分块
agent = aiAgent(
    name        : "knowledge-assistant",
    description : "所有公司文档和政策专家",
    memory      : vectorMemory
)

// Agent 检索语义相关的分块并以此为基础进行回答
response = agent.run(
    "企业客户的退款政策是什么？",
    {},
    { userId: "support-team", conversationId: "ticket-12345" }
)

当 Agent 运行时，向量内存会为查询检索最相关的文档分块，并在 LLM 调用前将其注入为上下文。LLM 将基于您的真实内容（而非幻觉）进行回答。

第三步：生产环境中的混合模式

对于大多数生产 RAG 场景，hybrid 内存优于纯向量内存：

$ node

// 结合短期对话内存与长期语义检索
productionMemory = aiMemory( "hybrid", config: {
    recentLimit   : 8,
    semanticLimit : 6,
    vectorProvider: "chroma",
    collection    : "company_knowledge"
} )

agent = aiAgent(
    name   : "enterprise-assistant",
    memory : productionMemory
)

前 8 条消息保持对话连贯性，语义层确保始终呈现相关的文档信息。两者结合，完美处理“我刚才问了什么？”和“关于 X 政策是怎么说的？”这两类需求。

🔧 Token 管理

两个 BIF 可帮助您评估上下文窗口的使用情况：

$ node

// 发送前计算 Token 数（近似值）
tokenCount = aiTokens( "这是我想要计算的文本", { method: "words" } )

// 为摄入大文档进行分块
chunks = aiChunk( largeText, {
    chunkSize : 1000,  // 每个分块的 Token 数
    overlap   : 200    // 分块间的重叠以保持上下文连续性
} )

aiChunk() 在 toMemory() 内部被使用，但您在构建自定义摄入流水线时也可以直接调用它。

🏗️ 每个 Agent 配置多个内存

Agent 可以同时拥有多个内存实例——当您希望针对不同类型的信息应用不同的保留策略时，这非常有用：

$ node

agent = aiAgent(
    name   : "research-assistant",
    memory : [
        // 短期：当前对话
        aiMemory( "window", config: { maxMessages: 20 } ),
        // 长期：语义知识库
        aiMemory( "chroma", config: {
            collection       : "research_papers",
            embeddingProvider: "openai"
        } )
    ]
)

// 动态添加另一个内存
agent.addMemory( aiMemory( "file", config: { filePath: "/audit/" } ) )

所有内存都会被并行读取和写入。在每次 LLM 调用前，从所有内存检索到的消息都会合并。## 📦 aiPopulate() BIF —— 无需实时调用的结构化内存一个常被忽视的功能是：aiPopulate() 可以在不进行任何 LLM 调用（大模型调用）的情况下，将 JSON 数据填充到类型化的 BoxLang 类中。这对于缓存和测试至关重要：

class CustomerProfile {
    property name="name"         type="string";
    property name="tier"         type="string";
    property name="openTickets"  type="numeric";
}

// 来自实时 AI 调用
profile = aiChat(
    "Extract the customer profile from: John Doe, Gold tier, 3 open tickets",
    { returnFormat: new CustomerProfile() }
)

// 将其序列化为 JSON 进行缓存
cachedJson = jsonSerialize( profile )

// 稍后 —— 无需再次调用 LLM 即可恢复类型化对象
restoredProfile = aiPopulate( new CustomerProfile(), cachedJson )
println( restoredProfile.getName() ) // "John Doe"

此功能非常适合：预填充测试数据、缓存 AI 提取结果，以及将现有的 JSON 数据转换为类型化对象。

下期预告

在第七部分（本系列的最后一篇）中，我们将深入探讨 MCP：如何使用来自任何 MCP 服务器的工具、MCPTool 代理的工作原理，以及如何将你自己的 BoxLang 函数发布为企业级 MCP 服务器，并实现完整的安全性、CORS、API 密钥验证和速率限制。

📖 完整文档 🌐 BoxLang AI 官网 📦 立即安装：install-bx-module bx-ai 🫶 专业支持

← 上一篇

文章 BoxLang AI 深度解析 —— 第 6 部分（共 7 部分）：内存系统与 RAG —— 构建具备记忆力的 AI 最早出现在 foojay 上。

分类导航

目录

🧠 两种内存类别

📋 标准内存类型

摘要内存 —— 原理解析

🔍 向量内存类型

混合内存 —— 兼具二者之长

🏢 基于调用的多租户身份路由

📚 文档加载器

🔗 构建完整的 RAG 流水线

第一步：摄入

第二步：查询

第三步：生产环境中的混合模式

🔧 Token 管理

🏗️ 每个 Agent 配置多个内存

下期预告