什么是知识?

CrewAI 中的知识是一个强大的系统,它允许 AI 智能体在执行任务期间访问和利用外部信息源。可以将其视为为您的智能体提供了一个他们在工作时可以查阅的参考库。

使用知识的主要优势

  • 用领域特定信息增强智能体
  • 用真实世界数据支持决策
  • 在对话中保持上下文
  • 将响应基于事实信息

支持的知识来源

CrewAI 开箱即用地支持多种类型的知识来源

文本来源

  • 原始字符串
  • 文本文件 (.txt)
  • PDF 文档

结构化数据

  • CSV 文件
  • Excel 电子表格
  • JSON 文档

支持的知识参数

参数类型必需描述
sourcesList[BaseKnowledgeSource]提供要存储和查询内容的知识来源列表。可以包括 PDF、CSV、Excel、JSON、文本文件或字符串内容。
collection_namestr存储知识的集合名称。用于标识不同的知识集。如果未提供,则默认为“knowledge”。
storageOptional[KnowledgeStorage]用于管理知识如何存储和检索的自定义存储配置。如果未提供,将创建默认存储。

与使用工具从向量数据库检索不同,预加载知识的智能体不需要检索角色或任务。只需添加您的智能体或团队运行所需的相关知识来源即可。

知识来源可以在智能体或团队级别添加。团队级别的知识来源将由团队中的**所有智能体**使用。智能体级别的知识来源将由预加载该知识的**特定智能体**使用。

快速入门示例

对于基于文件的知识来源,请确保将文件放置在项目根目录下的 `knowledge` 目录中。此外,创建来源时请使用相对于 `knowledge` 目录的相对路径。

这是一个使用基于字符串的知识的示例

代码
from crewai import Agent, Task, Crew, Process, LLM
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource

# Create a knowledge source
content = "Users name is John. He is 30 years old and lives in San Francisco."
string_source = StringKnowledgeSource(
    content=content,
)

# Create an LLM with a temperature of 0 to ensure deterministic outputs
llm = LLM(model="gpt-4o-mini", temperature=0)

# Create an agent with the knowledge store
agent = Agent(
    role="About User",
    goal="You know everything about the user.",
    backstory="""You are a master at understanding people and their preferences.""",
    verbose=True,
    allow_delegation=False,
    llm=llm,
)
task = Task(
    description="Answer the following questions about the user: {question}",
    expected_output="An answer to the question.",
    agent=agent,
)

crew = Crew(
    agents=[agent],
    tasks=[task],
    verbose=True,
    process=Process.sequential,
    knowledge_sources=[string_source], # Enable knowledge by adding the sources here. You can also add more sources to the sources list.
)

result = crew.kickoff(inputs={"question": "What city does John live in and how old is he?"})

这是另一个使用 `CrewDoclingSource` 的示例。CrewDoclingSource 实际上非常通用,可以处理包括 MD、PDF、DOCX、HTML 等多种文件格式。

您需要安装 `docling` 才能使以下示例正常工作:`uv add docling`

代码
from crewai import LLM, Agent, Crew, Process, Task
from crewai.knowledge.source.crew_docling_source import CrewDoclingSource

# Create a knowledge source
content_source = CrewDoclingSource(
    file_paths=[
        "https://lilianweng.github.io/posts/2024-11-28-reward-hacking",
        "https://lilianweng.github.io/posts/2024-07-07-hallucination",
    ],
)

# Create an LLM with a temperature of 0 to ensure deterministic outputs
llm = LLM(model="gpt-4o-mini", temperature=0)

# Create an agent with the knowledge store
agent = Agent(
    role="About papers",
    goal="You know everything about the papers.",
    backstory="""You are a master at understanding papers and their content.""",
    verbose=True,
    allow_delegation=False,
    llm=llm,
)
task = Task(
    description="Answer the following questions about the papers: {question}",
    expected_output="An answer to the question.",
    agent=agent,
)

crew = Crew(
    agents=[agent],
    tasks=[task],
    verbose=True,
    process=Process.sequential,
    knowledge_sources=[
        content_source
    ],  # Enable knowledge by adding the sources here. You can also add more sources to the sources list.
)

result = crew.kickoff(
    inputs={
        "question": "What is the reward hacking paper about? Be sure to provide sources."
    }
)

知识配置

您可以为团队或智能体配置知识。

代码
from crewai.knowledge.knowledge_config import KnowledgeConfig

knowledge_config = KnowledgeConfig(results_limit=10, score_threshold=0.5)

agent = Agent(
    ...
    knowledge_config=knowledge_config
)

`results_limit`:要返回的相关文档数量。默认为 3。`score_threshold`:文档被视为相关的最低分数。默认为 0.35。

更多示例

以下是如何使用不同类型知识来源的示例

注意:请确保创建 ./knowledge 文件夹。所有源文件(例如 .txt、.pdf、.xlsx、.json)都应放置在此文件夹中以便集中管理。

文本文件知识来源

from crewai.knowledge.source.text_file_knowledge_source import TextFileKnowledgeSource

# Create a text file knowledge source
text_source = TextFileKnowledgeSource(
    file_paths=["document.txt", "another.txt"]
)

# Create crew with text file source on agents or crew level
agent = Agent(
    ...
    knowledge_sources=[text_source]
)

crew = Crew(
    ...
    knowledge_sources=[text_source]
)

PDF 知识来源

from crewai.knowledge.source.pdf_knowledge_source import PDFKnowledgeSource

# Create a PDF knowledge source
pdf_source = PDFKnowledgeSource(
    file_paths=["document.pdf", "another.pdf"]
)

# Create crew with PDF knowledge source on agents or crew level
agent = Agent(
    ...
    knowledge_sources=[pdf_source]
)

crew = Crew(
    ...
    knowledge_sources=[pdf_source]
)

CSV 知识来源

from crewai.knowledge.source.csv_knowledge_source import CSVKnowledgeSource

# Create a CSV knowledge source
csv_source = CSVKnowledgeSource(
    file_paths=["data.csv"]
)

# Create crew with CSV knowledge source or on agent level
agent = Agent(
    ...
    knowledge_sources=[csv_source]
)

crew = Crew(
    ...
    knowledge_sources=[csv_source]
)

Excel 知识来源

from crewai.knowledge.source.excel_knowledge_source import ExcelKnowledgeSource

# Create an Excel knowledge source
excel_source = ExcelKnowledgeSource(
    file_paths=["spreadsheet.xlsx"]
)

# Create crew with Excel knowledge source on agents or crew level
agent = Agent(
    ...
    knowledge_sources=[excel_source]
)

crew = Crew(
    ...
    knowledge_sources=[excel_source]
)

JSON 知识来源

from crewai.knowledge.source.json_knowledge_source import JSONKnowledgeSource

# Create a JSON knowledge source
json_source = JSONKnowledgeSource(
    file_paths=["data.json"]
)

# Create crew with JSON knowledge source on agents or crew level
agent = Agent(
    ...
    knowledge_sources=[json_source]
)

crew = Crew(
    ...
    knowledge_sources=[json_source]
)

知识配置

分块配置

知识来源会自动将内容分块以进行更好的处理。您可以在知识来源中配置分块行为

from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource

source = StringKnowledgeSource(
    content="Your content here",
    chunk_size=4000,      # Maximum size of each chunk (default: 4000)
    chunk_overlap=200     # Overlap between chunks (default: 200)
)

分块配置有助于

  • 将大型文档分解成易于管理的片段
  • 通过分块重叠保持上下文
  • 优化检索准确性

嵌入配置

您还可以为知识存储配置嵌入器。如果您想为知识存储使用与智能体使用的嵌入器不同的嵌入器,这将很有用。`embedder` 参数支持多种嵌入模型提供商,包括

  • `openai`:OpenAI 的嵌入模型
  • `google`:Google 的文本嵌入模型
  • `azure`:Azure OpenAI 嵌入
  • `ollama`:使用 Ollama 的本地嵌入
  • `vertexai`:Google Cloud VertexAI 嵌入
  • `cohere`:Cohere 的嵌入模型
  • `voyageai`:VoyageAI 的嵌入模型
  • `bedrock`:AWS Bedrock 嵌入
  • `huggingface`:Hugging Face 模型
  • `watson`:IBM Watson 嵌入

这是一个使用 Google 的 `text-embedding-004` 模型为知识存储配置嵌入器的示例

from crewai import Agent, Task, Crew, Process, LLM
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource
import os

# Get the GEMINI API key
GEMINI_API_KEY = os.environ.get("GEMINI_API_KEY")

# Create a knowledge source
content = "Users name is John. He is 30 years old and lives in San Francisco."
string_source = StringKnowledgeSource(
    content=content,
)

# Create an LLM with a temperature of 0 to ensure deterministic outputs
gemini_llm = LLM(
    model="gemini/gemini-1.5-pro-002",
    api_key=GEMINI_API_KEY,
    temperature=0,
)

# Create an agent with the knowledge store
agent = Agent(
    role="About User",
    goal="You know everything about the user.",
    backstory="""You are a master at understanding people and their preferences.""",
    verbose=True,
    allow_delegation=False,
    llm=gemini_llm,
    embedder={
        "provider": "google",
        "config": {
            "model": "models/text-embedding-004",
            "api_key": GEMINI_API_KEY,
        }
    }
)

task = Task(
    description="Answer the following questions about the user: {question}",
    expected_output="An answer to the question.",
    agent=agent,
)

crew = Crew(
    agents=[agent],
    tasks=[task],
    verbose=True,
    process=Process.sequential,
    knowledge_sources=[string_source],
    embedder={
        "provider": "google",
        "config": {
            "model": "models/text-embedding-004",
            "api_key": GEMINI_API_KEY,
        }
    }
)

result = crew.kickoff(inputs={"question": "What city does John live in and how old is he?"})

查询重写

CrewAI 实现了一种智能查询重写机制,以优化知识检索。当智能体需要搜索知识来源时,原始任务提示会自动转换为更有效的搜索查询。

查询重写工作原理

  1. 当智能体执行具有可用知识来源的任务时,会触发 `_get_knowledge_search_query` 方法
  2. 使用智能体的 LLM 将原始任务提示转换为优化的搜索查询
  3. 然后使用此优化查询从知识来源检索相关信息

查询重写的好处

提高检索准确性

通过聚焦于关键概念并删除不相关内容,查询重写有助于检索更相关的信息。

上下文感知

重写的查询旨在针对向量数据库检索更加具体和上下文感知。

实现细节

查询重写通过一个系统提示透明地进行,该提示指示 LLM 执行以下操作:

  • 聚焦于预期任务的关键词
  • 使查询更具体和上下文感知
  • 删除不相关的内容,如输出格式说明
  • 仅生成重写后的查询,不带前言或后记

此机制完全自动化,无需用户配置。使用智能体的 LLM 执行查询重写,因此使用能力更强的 LLM 可以提高重写查询的质量。

示例

# Original task prompt
task_prompt = "Answer the following questions about the user's favorite movies: What movie did John watch last week? Format your answer in JSON."

# Behind the scenes, this might be rewritten as:
rewritten_query = "What movies did John watch last week?"

重写后的查询更侧重于核心信息需求,并删除了关于输出格式的不相关说明。

清除知识

如果您需要清除存储在 CrewAI 中的知识,可以使用带有 `--knowledge` 选项的 `crewai reset-memories` 命令。

命令
crewai reset-memories --knowledge

当您更新了知识来源并希望确保智能体使用最新信息时,这非常有用。

智能体特定知识

虽然可以使用 `crew.knowledge_sources` 在团队级别提供知识,但单个智能体也可以使用 `knowledge_sources` 参数拥有自己的知识来源

代码
from crewai import Agent, Task, Crew
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource

# Create agent-specific knowledge about a product
product_specs = StringKnowledgeSource(
    content="""The XPS 13 laptop features:
    - 13.4-inch 4K display
    - Intel Core i7 processor
    - 16GB RAM
    - 512GB SSD storage
    - 12-hour battery life""",
    metadata={"category": "product_specs"}
)

# Create a support agent with product knowledge
support_agent = Agent(
    role="Technical Support Specialist",
    goal="Provide accurate product information and support.",
    backstory="You are an expert on our laptop products and specifications.",
    knowledge_sources=[product_specs]  # Agent-specific knowledge
)

# Create a task that requires product knowledge
support_task = Task(
    description="Answer this customer question: {question}",
    agent=support_agent
)

# Create and run the crew
crew = Crew(
    agents=[support_agent],
    tasks=[support_task]
)

# Get answer about the laptop's specifications
result = crew.kickoff(
    inputs={"question": "What is the storage capacity of the XPS 13?"}
)

# Resetting the agent specific knowledge via crew object
crew.reset_memories(command_type = 'agent_knowledge')

# Resetting the agent specific knowledge via CLI
crewai reset-memories --agent-knowledge 
crewai reset-memories -akn

智能体特定知识的好处

  • 为智能体提供其角色的专门信息
  • 保持智能体之间的关注点分离
  • 结合团队级知识实现分层信息访问

自定义知识来源

CrewAI 允许您通过扩展 `BaseKnowledgeSource` 类来为任何类型的数据创建自定义知识来源。让我们创建一个实际示例,该示例抓取并处理空间新闻文章。

空间新闻知识来源示例

from crewai import Agent, Task, Crew, Process, LLM
from crewai.knowledge.source.base_knowledge_source import BaseKnowledgeSource
import requests
from datetime import datetime
from typing import Dict, Any
from pydantic import BaseModel, Field

class SpaceNewsKnowledgeSource(BaseKnowledgeSource):
    """Knowledge source that fetches data from Space News API."""

    api_endpoint: str = Field(description="API endpoint URL")
    limit: int = Field(default=10, description="Number of articles to fetch")

    def load_content(self) -> Dict[Any, str]:
        """Fetch and format space news articles."""
        try:
            response = requests.get(
                f"{self.api_endpoint}?limit={self.limit}"
            )
            response.raise_for_status()

            data = response.json()
            articles = data.get('results', [])

            formatted_data = self.validate_content(articles)
            return {self.api_endpoint: formatted_data}
        except Exception as e:
            raise ValueError(f"Failed to fetch space news: {str(e)}")

    def validate_content(self, articles: list) -> str:
        """Format articles into readable text."""
        formatted = "Space News Articles:\n\n"
        for article in articles:
            formatted += f"""
                Title: {article['title']}
                Published: {article['published_at']}
                Summary: {article['summary']}
                News Site: {article['news_site']}
                URL: {article['url']}
                -------------------"""
        return formatted

    def add(self) -> None:
        """Process and store the articles."""
        content = self.load_content()
        for _, text in content.items():
            chunks = self._chunk_text(text)
            self.chunks.extend(chunks)

        self._save_documents()

# Create knowledge source
recent_news = SpaceNewsKnowledgeSource(
    api_endpoint="https://api.spaceflightnewsapi.net/v4/articles",
    limit=10,
)

# Create specialized agent
space_analyst = Agent(
    role="Space News Analyst",
    goal="Answer questions about space news accurately and comprehensively",
    backstory="""You are a space industry analyst with expertise in space exploration,
    satellite technology, and space industry trends. You excel at answering questions
    about space news and providing detailed, accurate information.""",
    knowledge_sources=[recent_news],
    llm=LLM(model="gpt-4", temperature=0.0)
)

# Create task that handles user questions
analysis_task = Task(
    description="Answer this question about space news: {user_question}",
    expected_output="A detailed answer based on the recent space news articles",
    agent=space_analyst
)

# Create and run the crew
crew = Crew(
    agents=[space_analyst],
    tasks=[analysis_task],
    verbose=True,
    process=Process.sequential
)

# Example usage
result = crew.kickoff(
    inputs={"user_question": "What are the latest developments in space exploration?"}
)

关键组成部分解释

  1. 自定义知识来源 (`SpaceNewsKnowledgeSource`):

    • 扩展 `BaseKnowledgeSource` 以与 CrewAI 集成
    • 可配置的 API 端点和文章限制
    • 实现三个关键方法
      • `load_content()`:从 API 获取文章
      • `_format_articles()`:将文章结构化为可读文本
      • `add()`:处理并存储内容
  2. 智能体配置:

    • 作为空间新闻分析师的专业角色
    • 使用知识来源访问空间新闻
  3. 任务设置:

    • 通过 `{user_question}` 接收用户问题作为输入
    • 旨在根据知识来源提供详细答案
  4. 团队编排:

    • 管理智能体和任务之间的工作流
    • 通过 kickoff 方法处理输入/输出

此示例演示了如何

  • 创建抓取实时数据的自定义知识来源
  • 处理和格式化外部数据供 AI 消费
  • 使用知识来源回答特定的用户问题
  • 将所有内容无缝集成到 CrewAI 的智能体系统中

关于 Spaceflight News API

本示例使用 Spaceflight News API,该 API

  • 提供对空间相关新闻文章的免费访问
  • 无需认证
  • 返回关于空间新闻的结构化数据
  • 支持分页和过滤

您可以通过修改端点 URL 来自定义 API 查询

# Fetch more articles
recent_news = SpaceNewsKnowledgeSource(
    api_endpoint="https://api.spaceflightnewsapi.net/v4/articles",
    limit=20,  # Increase the number of articles
)

# Add search parameters
recent_news = SpaceNewsKnowledgeSource(
    api_endpoint="https://api.spaceflightnewsapi.net/v4/articles?search=NASA", # Search for NASA news
    limit=10,
)

最佳实践