跳转到主要内容

概述

CrewAI 中的知识是一个强大的系统,允许 AI 智能体在执行任务期间访问和利用外部信息源。可以把它想象成给你的智能体一个可以在工作时查阅的参考库。
使用知识的主要好处
  • 用特定领域的信息增强智能体
  • 用现实世界的数据支持决策
  • 在对话中保持上下文
  • 将回应建立在事实信息之上

快速入门示例

对于基于文件的知识源,请确保将您的文件放置在项目根目录下的一个 `knowledge` 目录中。此外,在创建知识源时,请使用相对于 `knowledge` 目录的路径。

向量存储 (RAG) 客户端配置

CrewAI 为向量存储提供了一个与提供商无关的 RAG 客户端抽象。默认提供商是 ChromaDB,同时也支持 Qdrant。您可以使用配置实用程序切换提供商。 目前支持:
  • ChromaDB (默认)
  • Qdrant
代码
from crewai.rag.config.utils import set_rag_config, get_rag_client, clear_rag_config

# ChromaDB (default)
from crewai.rag.chromadb.config import ChromaDBConfig
set_rag_config(ChromaDBConfig())
chromadb_client = get_rag_client()

# Qdrant
from crewai.rag.qdrant.config import QdrantConfig
set_rag_config(QdrantConfig())
qdrant_client = get_rag_client()

# Example operations (same API for any provider)
client = qdrant_client  # or chromadb_client
client.create_collection(collection_name="docs")
client.add_documents(
    collection_name="docs",
    documents=[{"id": "1", "content": "CrewAI enables collaborative AI agents."}],
)
results = client.search(collection_name="docs", query="collaborative agents", limit=3)

clear_rag_config()  # optional reset
此 RAG 客户端与知识库的内置存储是分开的。当您需要直接控制向量存储或自定义检索管道时,请使用它。

基本字符串知识示例

代码
from crewai import Agent, Task, Crew, Process, LLM
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource

# Create a knowledge source
content = "Users name is John. He is 30 years old and lives in San Francisco."
string_source = StringKnowledgeSource(content=content)

# Create an LLM with a temperature of 0 to ensure deterministic outputs
llm = LLM(model="gpt-4o-mini", temperature=0)

# Create an agent with the knowledge store
agent = Agent(
    role="About User",
    goal="You know everything about the user.",
    backstory="You are a master at understanding people and their preferences.",
    verbose=True,
    allow_delegation=False,
    llm=llm,
)

task = Task(
    description="Answer the following questions about the user: {question}",
    expected_output="An answer to the question.",
    agent=agent,
)

crew = Crew(
    agents=[agent],
    tasks=[task],
    verbose=True,
    process=Process.sequential,
    knowledge_sources=[string_source], # Enable knowledge by adding the sources here
)

result = crew.kickoff(inputs={"question": "What city does John live in and how old is he?"})

网页内容知识示例

您需要安装 `docling` 才能使以下示例正常工作:`uv add docling`
代码
from crewai import LLM, Agent, Crew, Process, Task
from crewai.knowledge.source.crew_docling_source import CrewDoclingSource

# Create a knowledge source from web content
content_source = CrewDoclingSource(
    file_paths=[
        "https://lilianweng.github.io/posts/2024-11-28-reward-hacking",
        "https://lilianweng.github.io/posts/2024-07-07-hallucination",
    ],
)

# Create an LLM with a temperature of 0 to ensure deterministic outputs
llm = LLM(model="gpt-4o-mini", temperature=0)

# Create an agent with the knowledge store
agent = Agent(
    role="About papers",
    goal="You know everything about the papers.",
    backstory="You are a master at understanding papers and their content.",
    verbose=True,
    allow_delegation=False,
    llm=llm,
)

task = Task(
    description="Answer the following questions about the papers: {question}",
    expected_output="An answer to the question.",
    agent=agent,
)

crew = Crew(
    agents=[agent],
    tasks=[task],
    verbose=True,
    process=Process.sequential,
    knowledge_sources=[content_source],
)

result = crew.kickoff(
    inputs={"question": "What is the reward hacking paper about? Be sure to provide sources."}
)

支持的知识源

CrewAI 开箱即用支持多种类型的知识源

文本源

  • 原始字符串
  • 文本文件 (.txt)
  • PDF 文档

结构化数据

  • CSV 文件
  • Excel 电子表格
  • JSON 文档

文本文件知识源

from crewai.knowledge.source.text_file_knowledge_source import TextFileKnowledgeSource

text_source = TextFileKnowledgeSource(
    file_paths=["document.txt", "another.txt"]
)

PDF 知识源

from crewai.knowledge.source.pdf_knowledge_source import PDFKnowledgeSource

pdf_source = PDFKnowledgeSource(
    file_paths=["document.pdf", "another.pdf"]
)

CSV 知识源

from crewai.knowledge.source.csv_knowledge_source import CSVKnowledgeSource

csv_source = CSVKnowledgeSource(
    file_paths=["data.csv"]
)

Excel 知识源

from crewai.knowledge.source.excel_knowledge_source import ExcelKnowledgeSource

excel_source = ExcelKnowledgeSource(
    file_paths=["spreadsheet.xlsx"]
)

JSON 知识源

from crewai.knowledge.source.json_knowledge_source import JSONKnowledgeSource

json_source = JSONKnowledgeSource(
    file_paths=["data.json"]
)
请确保创建 ./knowledge 文件夹。所有源文件(例如 .txt, .pdf, .xlsx, .json)都应放在此文件夹中以进行集中管理。

智能体 vs. 群组知识:完整指南

理解知识级别:CrewAI 支持智能体级别和群组级别的知识。本节将明确解释它们各自的工作方式、初始化时机,并解决关于依赖关系的常见误解。

知识初始化究竟是如何工作的

以下是使用知识时确切发生的事情

智能体级别知识(独立)

from crewai import Agent, Task, Crew
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource

# Agent with its own knowledge - NO crew knowledge needed
specialist_knowledge = StringKnowledgeSource(
    content="Specialized technical information for this agent only"
)

specialist_agent = Agent(
    role="Technical Specialist",
    goal="Provide technical expertise",
    backstory="Expert in specialized technical domains",
    knowledge_sources=[specialist_knowledge]  # Agent-specific knowledge
)

task = Task(
    description="Answer technical questions",
    agent=specialist_agent,
    expected_output="Technical answer"
)

# No crew-level knowledge required
crew = Crew(
    agents=[specialist_agent],
    tasks=[task]
)

result = crew.kickoff()  # Agent knowledge works independently

在 `crew.kickoff()` 期间会发生什么

当您调用 `crew.kickoff()` 时,确切的顺序如下
# During kickoff
for agent in self.agents:
    agent.crew = self  # Agent gets reference to crew
    agent.set_knowledge(crew_embedder=self.embedder)  # Agent knowledge initialized
    agent.create_agent_executor()

存储独立性

每个知识级别都使用独立的存储集合
# Agent knowledge storage
agent_collection_name = agent.role  # e.g., "Technical Specialist"

# Crew knowledge storage  
crew_collection_name = "crew"

# Both stored in same ChromaDB instance but different collections
# Path: ~/.local/share/CrewAI/{project}/knowledge/
#   ├── crew/                    # Crew knowledge collection
#   ├── Technical Specialist/    # Agent knowledge collection
#   └── Another Agent Role/      # Another agent's collection

完整的可用示例

示例 1:仅智能体知识

from crewai import Agent, Task, Crew
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource

# Agent-specific knowledge
agent_knowledge = StringKnowledgeSource(
    content="Agent-specific information that only this agent needs"
)

agent = Agent(
    role="Specialist",
    goal="Use specialized knowledge",
    backstory="Expert with specific knowledge",
    knowledge_sources=[agent_knowledge],
    embedder={  # Agent can have its own embedder
        "provider": "openai",
        "config": {"model": "text-embedding-3-small"}
    }
)

task = Task(
    description="Answer using your specialized knowledge",
    agent=agent,
    expected_output="Answer based on agent knowledge"
)

# No crew knowledge needed
crew = Crew(agents=[agent], tasks=[task])
result = crew.kickoff()  # Works perfectly

示例 2:智能体和群组知识

# Crew-wide knowledge (shared by all agents)
crew_knowledge = StringKnowledgeSource(
    content="Company policies and general information for all agents"
)

# Agent-specific knowledge
specialist_knowledge = StringKnowledgeSource(
    content="Technical specifications only the specialist needs"
)

specialist = Agent(
    role="Technical Specialist",
    goal="Provide technical expertise",
    backstory="Technical expert",
    knowledge_sources=[specialist_knowledge]  # Agent-specific
)

generalist = Agent(
    role="General Assistant", 
    goal="Provide general assistance",
    backstory="General helper"
    # No agent-specific knowledge
)

crew = Crew(
    agents=[specialist, generalist],
    tasks=[...],
    knowledge_sources=[crew_knowledge]  # Crew-wide knowledge
)

# Result:
# - specialist gets: crew_knowledge + specialist_knowledge
# - generalist gets: crew_knowledge only

示例 3:多个智能体拥有不同知识

# Different knowledge for different agents
sales_knowledge = StringKnowledgeSource(content="Sales procedures and pricing")
tech_knowledge = StringKnowledgeSource(content="Technical documentation")
support_knowledge = StringKnowledgeSource(content="Support procedures")

sales_agent = Agent(
    role="Sales Representative",
    knowledge_sources=[sales_knowledge],
    embedder={"provider": "openai", "config": {"model": "text-embedding-3-small"}}
)

tech_agent = Agent(
    role="Technical Expert", 
    knowledge_sources=[tech_knowledge],
    embedder={"provider": "ollama", "config": {"model": "mxbai-embed-large"}}
)

support_agent = Agent(
    role="Support Specialist",
    knowledge_sources=[support_knowledge]
    # Will use crew embedder as fallback
)

crew = Crew(
    agents=[sales_agent, tech_agent, support_agent],
    tasks=[...],
    embedder={  # Fallback embedder for agents without their own
        "provider": "google",
        "config": {"model": "text-embedding-004"}
    }
)

# Each agent gets only their specific knowledge
# Each can use different embedding providers
与使用工具从向量数据库检索不同,预加载了知识的智能体不需要检索角色或任务。只需添加您的智能体或群组需要运行的相关知识源即可。知识源可以在智能体或群组级别添加。群组级别的知识源将被群组中的所有智能体使用。智能体级别的知识源将由预加载了该知识的特定智能体使用。

知识配置

您可以为群组或智能体配置知识。
代码
from crewai.knowledge.knowledge_config import KnowledgeConfig

knowledge_config = KnowledgeConfig(results_limit=10, score_threshold=0.5)

agent = Agent(
    ...
    knowledge_config=knowledge_config
)
`results_limit`:返回相关文档的数量。默认为 3。`score_threshold`:一个文档被视为相关的最低分数。默认为 0.35。

支持的知识参数

sources
List[BaseKnowledgeSource]
必需
提供要存储和查询内容的知识源列表。可以包括 PDF、CSV、Excel、JSON、文本文件或字符串内容。
collection_name
str
存储知识的集合名称。用于识别不同的知识集。如果未提供,则默认为 “knowledge”。
storage
Optional[KnowledgeStorage]
用于管理知识如何存储和检索的自定义存储配置。如果未提供,将创建一个默认存储。

知识存储透明度

理解知识存储:CrewAI 使用 ChromaDB 进行向量存储,自动将知识源存储在特定于平台的目录中。了解这些位置和默认设置有助于生产部署、调试和存储管理。

CrewAI 在哪里存储知识文件

默认情况下,CrewAI 使用与内存相同的存储系统,将知识存储在特定于平台的目录中

各平台的默认存储位置

macOS
~/Library/Application Support/CrewAI/{project_name}/
└── knowledge/                    # Knowledge ChromaDB files
    ├── chroma.sqlite3           # ChromaDB metadata
    ├── {collection_id}/         # Vector embeddings
    └── knowledge_{collection}/  # Named collections
Linux
~/.local/share/CrewAI/{project_name}/
└── knowledge/
    ├── chroma.sqlite3
    ├── {collection_id}/
    └── knowledge_{collection}/
Windows
C:\Users\{username}\AppData\Local\CrewAI\{project_name}\
└── knowledge\
    ├── chroma.sqlite3
    ├── {collection_id}\
    └── knowledge_{collection}\

查找您的知识存储位置

要确切查看 CrewAI 存储知识文件的位置
from crewai.utilities.paths import db_storage_path
import os

# Get the knowledge storage path
knowledge_path = os.path.join(db_storage_path(), "knowledge")
print(f"Knowledge storage location: {knowledge_path}")

# List knowledge collections and files
if os.path.exists(knowledge_path):
    print("\nKnowledge storage contents:")
    for item in os.listdir(knowledge_path):
        item_path = os.path.join(knowledge_path, item)
        if os.path.isdir(item_path):
            print(f"📁 Collection: {item}/")
            # Show collection contents
            try:
                for subitem in os.listdir(item_path):
                    print(f"   └── {subitem}")
            except PermissionError:
                print(f"   └── (permission denied)")
        else:
            print(f"📄 {item}")
else:
    print("No knowledge storage found yet.")

控制知识存储位置

import os
from crewai import Crew

# Set custom storage location for all CrewAI data
os.environ["CREWAI_STORAGE_DIR"] = "./my_project_storage"

# All knowledge will now be stored in ./my_project_storage/knowledge/
crew = Crew(
    agents=[...],
    tasks=[...],
    knowledge_sources=[...]
)

选项 2:自定义知识存储

from crewai.knowledge.storage.knowledge_storage import KnowledgeStorage
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource

# Create custom storage with specific embedder
custom_storage = KnowledgeStorage(
    embedder={
        "provider": "ollama",
        "config": {"model": "mxbai-embed-large"}
    },
    collection_name="my_custom_knowledge"
)

# Use with knowledge sources
knowledge_source = StringKnowledgeSource(
    content="Your knowledge content here"
)
knowledge_source.storage = custom_storage

选项 3:项目特定的知识存储

import os
from pathlib import Path

# Store knowledge in project directory
project_root = Path(__file__).parent
knowledge_dir = project_root / "knowledge_storage"

os.environ["CREWAI_STORAGE_DIR"] = str(knowledge_dir)

# Now all knowledge will be stored in your project directory

默认嵌入提供商行为

默认嵌入提供商:CrewAI 默认使用 OpenAI 嵌入(`text-embedding-3-small`)进行知识存储,即使在使用不同的 LLM 提供商时也是如此。您可以轻松自定义此设置以匹配您的配置。

理解默认行为

from crewai import Agent, Crew, LLM
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource

# When using Claude as your LLM...
agent = Agent(
    role="Researcher",
    goal="Research topics",
    backstory="Expert researcher",
    llm=LLM(provider="anthropic", model="claude-3-sonnet")  # Using Claude
)

# CrewAI will still use OpenAI embeddings by default for knowledge
# This ensures consistency but may not match your LLM provider preference
knowledge_source = StringKnowledgeSource(content="Research data...")

crew = Crew(
    agents=[agent],
    tasks=[...],
    knowledge_sources=[knowledge_source]
    # Default: Uses OpenAI embeddings even with Claude LLM
)

自定义知识嵌入提供商

# Option 1: Use Voyage AI (recommended by Anthropic for Claude users)
crew = Crew(
    agents=[agent],
    tasks=[...],
    knowledge_sources=[knowledge_source],
    embedder={
        "provider": "voyageai",  # Recommended for Claude users
        "config": {
            "api_key": "your-voyage-api-key",
            "model": "voyage-3"  # or "voyage-3-large" for best quality
        }
    }
)

# Option 2: Use local embeddings (no external API calls)
crew = Crew(
    agents=[agent],
    tasks=[...],
    knowledge_sources=[knowledge_source],
    embedder={
        "provider": "ollama",
        "config": {
            "model": "mxbai-embed-large",
            "url": "https://:11434/api/embeddings"
        }
    }
)

# Option 3: Agent-level embedding customization
agent = Agent(
    role="Researcher",
    goal="Research topics",
    backstory="Expert researcher",
    knowledge_sources=[knowledge_source],
    embedder={
        "provider": "google",
        "config": {
            "model": "models/text-embedding-004",
            "api_key": "your-google-key"
        }
    }
)

配置 Azure OpenAI 嵌入

当使用 Azure OpenAI 嵌入时
  1. 请确保您首先在 Azure 平台上部署嵌入模型
  2. 然后您需要使用以下配置
agent = Agent(
    role="Researcher",
    goal="Research topics",
    backstory="Expert researcher",
    knowledge_sources=[knowledge_source],
    embedder={
        "provider": "azure",
        "config": {
            "api_key": "your-azure-api-key",
            "model": "text-embedding-ada-002", # change to the model you are using and is deployed in Azure
            "api_base": "https://your-azure-endpoint.openai.azure.com/",
            "api_version": "2024-02-01"
        }
    }
)

高级功能

查询重写

CrewAI 实现了一种智能查询重写机制,以优化知识检索。当一个智能体需要搜索知识源时,原始任务提示会自动转换为更有效的搜索查询。

查询重写如何工作

  1. 当智能体执行具有可用知识源的任务时,会触发 `_get_knowledge_search_query` 方法
  2. 智能体的 LLM 用于将原始任务提示转换为优化的搜索查询
  3. 然后使用此优化查询从知识源中检索相关信息

查询重写的好处

提高检索准确性

通过关注关键概念并删除不相关内容,查询重写有助于检索更相关的信息。

上下文感知

重写的查询旨在为向量数据库检索提供更具体和上下文感知的内容。

示例

# Original task prompt
task_prompt = "Answer the following questions about the user's favorite movies: What movie did John watch last week? Format your answer in JSON."

# Behind the scenes, this might be rewritten as:
rewritten_query = "What movies did John watch last week?"
重写后的查询更侧重于核心信息需求,并删除了关于输出格式的无关指令。
此机制是全自动的,无需用户配置。智能体的 LLM 用于执行查询重写,因此使用功能更强大的 LLM 可以提高重写查询的质量。

知识事件

CrewAI 在知识检索过程中会发出事件,您可以使用事件系统来监听这些事件。这些事件使您能够监控、调试和分析知识是如何被您的智能体检索和使用的。

可用的知识事件

  • KnowledgeRetrievalStartedEvent:当智能体开始从知识源检索知识时发出
  • KnowledgeRetrievalCompletedEvent:当知识检索完成时发出,包括使用的查询和检索到的内容
  • KnowledgeQueryStartedEvent:当对知识源的查询开始时发出
  • KnowledgeQueryCompletedEvent:当查询成功完成时发出
  • KnowledgeQueryFailedEvent:当对知识源的查询失败时发出
  • KnowledgeSearchQueryFailedEvent:当搜索查询失败时发出

示例:监控知识检索

from crewai.events import (
    KnowledgeRetrievalStartedEvent,
    KnowledgeRetrievalCompletedEvent,
    BaseEventListener,
)

class KnowledgeMonitorListener(BaseEventListener):
    def setup_listeners(self, crewai_event_bus):
        @crewai_event_bus.on(KnowledgeRetrievalStartedEvent)
        def on_knowledge_retrieval_started(source, event):
            print(f"Agent '{event.agent.role}' started retrieving knowledge")
            
        @crewai_event_bus.on(KnowledgeRetrievalCompletedEvent)
        def on_knowledge_retrieval_completed(source, event):
            print(f"Agent '{event.agent.role}' completed knowledge retrieval")
            print(f"Query: {event.query}")
            print(f"Retrieved {len(event.retrieved_knowledge)} knowledge chunks")

# Create an instance of your listener
knowledge_monitor = KnowledgeMonitorListener()
有关使用事件的更多信息,请参阅事件监听器文档。

自定义知识源

CrewAI 允许您通过扩展 `BaseKnowledgeSource` 类为任何类型的数据创建自定义知识源。让我们创建一个获取和处理太空新闻文章的实际示例。

太空新闻知识源示例

from crewai import Agent, Task, Crew, Process, LLM
from crewai.knowledge.source.base_knowledge_source import BaseKnowledgeSource
import requests
from datetime import datetime
from typing import Dict, Any
from pydantic import BaseModel, Field

class SpaceNewsKnowledgeSource(BaseKnowledgeSource):
    """Knowledge source that fetches data from Space News API."""

    api_endpoint: str = Field(description="API endpoint URL")
    limit: int = Field(default=10, description="Number of articles to fetch")

    def load_content(self) -> Dict[Any, str]:
        """Fetch and format space news articles."""
        try:
            response = requests.get(
                f"{self.api_endpoint}?limit={self.limit}"
            )
            response.raise_for_status()

            data = response.json()
            articles = data.get('results', [])

            formatted_data = self.validate_content(articles)
            return {self.api_endpoint: formatted_data}
        except Exception as e:
            raise ValueError(f"Failed to fetch space news: {str(e)}")

    def validate_content(self, articles: list) -> str:
        """Format articles into readable text."""
        formatted = "Space News Articles:\n\n"
        for article in articles:
            formatted += f"""
                Title: {article['title']}
                Published: {article['published_at']}
                Summary: {article['summary']}
                News Site: {article['news_site']}
                URL: {article['url']}
                -------------------"""
        return formatted

    def add(self) -> None:
        """Process and store the articles."""
        content = self.load_content()
        for _, text in content.items():
            chunks = self._chunk_text(text)
            self.chunks.extend(chunks)

        self._save_documents()

# Create knowledge source
recent_news = SpaceNewsKnowledgeSource(
    api_endpoint="https://api.spaceflightnewsapi.net/v4/articles",
    limit=10,
)

# Create specialized agent
space_analyst = Agent(
    role="Space News Analyst",
    goal="Answer questions about space news accurately and comprehensively",
    backstory="""You are a space industry analyst with expertise in space exploration,
    satellite technology, and space industry trends. You excel at answering questions
    about space news and providing detailed, accurate information.""",
    knowledge_sources=[recent_news],
    llm=LLM(model="gpt-4", temperature=0.0)
)

# Create task that handles user questions
analysis_task = Task(
    description="Answer this question about space news: {user_question}",
    expected_output="A detailed answer based on the recent space news articles",
    agent=space_analyst
)

# Create and run the crew
crew = Crew(
    agents=[space_analyst],
    tasks=[analysis_task],
    verbose=True,
    process=Process.sequential
)

# Example usage
result = crew.kickoff(
    inputs={"user_question": "What are the latest developments in space exploration?"}
)

调试与故障排除

调试知识问题

检查智能体知识初始化

from crewai import Agent, Crew, Task
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource

knowledge_source = StringKnowledgeSource(content="Test knowledge")

agent = Agent(
    role="Test Agent",
    goal="Test knowledge",
    backstory="Testing",
    knowledge_sources=[knowledge_source]
)

crew = Crew(agents=[agent], tasks=[Task(...)])

# Before kickoff - knowledge not initialized
print(f"Before kickoff - Agent knowledge: {getattr(agent, 'knowledge', None)}")

crew.kickoff()

# After kickoff - knowledge initialized
print(f"After kickoff - Agent knowledge: {agent.knowledge}")
print(f"Agent knowledge collection: {agent.knowledge.storage.collection_name}")
print(f"Number of sources: {len(agent.knowledge.sources)}")

验证知识存储位置

import os
from crewai.utilities.paths import db_storage_path

# Check storage structure
storage_path = db_storage_path()
knowledge_path = os.path.join(storage_path, "knowledge")

if os.path.exists(knowledge_path):
    print("Knowledge collections found:")
    for collection in os.listdir(knowledge_path):
        collection_path = os.path.join(knowledge_path, collection)
        if os.path.isdir(collection_path):
            print(f"  - {collection}/")
            # Show collection contents
            for item in os.listdir(collection_path):
                print(f"    └── {item}")

测试知识检索

# Test agent knowledge retrieval
if hasattr(agent, 'knowledge') and agent.knowledge:
    test_query = ["test query"]
    results = agent.knowledge.query(test_query)
    print(f"Agent knowledge results: {len(results)} documents found")
    
    # Test crew knowledge retrieval (if exists)
    if hasattr(crew, 'knowledge') and crew.knowledge:
        crew_results = crew.query_knowledge(test_query)
        print(f"Crew knowledge results: {len(crew_results)} documents found")

检查知识集合

import chromadb
from crewai.utilities.paths import db_storage_path
import os

# Connect to CrewAI's knowledge ChromaDB
knowledge_path = os.path.join(db_storage_path(), "knowledge")

if os.path.exists(knowledge_path):
    client = chromadb.PersistentClient(path=knowledge_path)
    collections = client.list_collections()
    
    print("Knowledge Collections:")
    for collection in collections:
        print(f"  - {collection.name}: {collection.count()} documents")
        
        # Sample a few documents to verify content
        if collection.count() > 0:
            sample = collection.peek(limit=2)
            print(f"    Sample content: {sample['documents'][0][:100]}...")
else:
    print("No knowledge storage found")

检查知识处理

from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource

# Create a test knowledge source
test_source = StringKnowledgeSource(
    content="Test knowledge content for debugging",
    chunk_size=100,  # Small chunks for testing
    chunk_overlap=20
)

# Check chunking behavior
print(f"Original content length: {len(test_source.content)}")
print(f"Chunk size: {test_source.chunk_size}")
print(f"Chunk overlap: {test_source.chunk_overlap}")

# Process and inspect chunks
test_source.add()
print(f"Number of chunks created: {len(test_source.chunks)}")
for i, chunk in enumerate(test_source.chunks[:3]):  # Show first 3 chunks
    print(f"Chunk {i+1}: {chunk[:50]}...")

常见的知识存储问题

“文件未找到”错误
# Ensure files are in the correct location
from crewai.utilities.constants import KNOWLEDGE_DIRECTORY
import os

knowledge_dir = KNOWLEDGE_DIRECTORY  # Usually "knowledge"
file_path = os.path.join(knowledge_dir, "your_file.pdf")

if not os.path.exists(file_path):
    print(f"File not found: {file_path}")
    print(f"Current working directory: {os.getcwd()}")
    print(f"Expected knowledge directory: {os.path.abspath(knowledge_dir)}")
“嵌入维度不匹配”错误
# This happens when switching embedding providers
# Reset knowledge storage to clear old embeddings
crew.reset_memories(command_type='knowledge')

# Or use consistent embedding providers
crew = Crew(
    agents=[...],
    tasks=[...],
    knowledge_sources=[...],
    embedder={"provider": "openai", "config": {"model": "text-embedding-3-small"}}
)
“ChromaDB 权限被拒绝”错误
# Fix storage permissions
chmod -R 755 ~/.local/share/CrewAI/
知识在运行之间不持久
# Verify storage location consistency
import os
from crewai.utilities.paths import db_storage_path

print("CREWAI_STORAGE_DIR:", os.getenv("CREWAI_STORAGE_DIR"))
print("Computed storage path:", db_storage_path())
print("Knowledge path:", os.path.join(db_storage_path(), "knowledge"))

知识重置命令

# Reset only agent-specific knowledge
crew.reset_memories(command_type='agent_knowledge')

# Reset both crew and agent knowledge  
crew.reset_memories(command_type='knowledge')

# CLI commands
# crewai reset-memories --agent-knowledge  # Agent knowledge only
# crewai reset-memories --knowledge        # All knowledge

清除知识

如果您需要清除 CrewAI 中存储的知识,可以使用带有 `--knowledge` 选项的 `crewai reset-memories` 命令。
命令
crewai reset-memories --knowledge
当您更新了知识源并希望确保智能体使用的是最新信息时,这非常有用。

最佳实践

  • 保持块大小适合您的内容类型
  • 考虑内容重叠以保持上下文
  • 将相关信息组织到单独的知识源中
  • 根据内容复杂性调整块大小
  • 配置适当的嵌入模型
  • 考虑使用本地嵌入提供商以加快处理速度
  • 使用 CrewAI 提供的典型文件结构,每次触发 kickoff 时都会嵌入知识源。
  • 如果知识源很大,这会导致效率低下和延迟增加,因为每次都会嵌入相同的数据。
  • 为了解决这个问题,直接初始化 knowledge 参数而不是 knowledge_sources 参数。
  • 链接到问题以获得完整概念 Github 问题
  • 为角色特定信息使用智能体级别的知识
  • 为所有智能体需要的共享信息使用群组级别的知识
  • 如果需要不同的嵌入策略,请在智能体级别设置嵌入器
  • 通过保持智能体角色的描述性来使用一致的集合命名
  • 通过在 kickoff 后检查 agent.knowledge 来测试知识初始化
  • 监控存储位置以了解知识存储在哪里
  • 使用正确的命令类型适当地重置知识
  • 在生产中将 `CREWAI_STORAGE_DIR` 设置为已知位置
  • 选择明确的嵌入提供商以匹配您的 LLM 设置并避免 API 密钥冲突
  • 监控知识存储大小,因为它会随着文档的增加而增长
  • 使用集合名称按领域或目的组织知识源
  • 在您的备份和部署策略中包括知识目录
  • 为知识文件和存储目录设置适当的文件权限
  • 为 API 密钥和敏感配置使用环境变量