概述
CrewAI 中的知识是一个强大的系统,允许 AI 智能体在执行任务期间访问和利用外部信息源。可以把它想象成给你的智能体一个可以在工作时查阅的参考库。使用知识的主要好处
- 用特定领域的信息增强智能体
- 用现实世界的数据支持决策
- 在对话中保持上下文
- 将回应建立在事实信息之上
快速入门示例
对于基于文件的知识源,请确保将您的文件放置在项目根目录下的一个 `knowledge` 目录中。此外,在创建知识源时,请使用相对于 `knowledge` 目录的路径。
向量存储 (RAG) 客户端配置
CrewAI 为向量存储提供了一个与提供商无关的 RAG 客户端抽象。默认提供商是 ChromaDB,同时也支持 Qdrant。您可以使用配置实用程序切换提供商。 目前支持:- ChromaDB (默认)
- Qdrant
代码
复制
询问 AI
from crewai.rag.config.utils import set_rag_config, get_rag_client, clear_rag_config
# ChromaDB (default)
from crewai.rag.chromadb.config import ChromaDBConfig
set_rag_config(ChromaDBConfig())
chromadb_client = get_rag_client()
# Qdrant
from crewai.rag.qdrant.config import QdrantConfig
set_rag_config(QdrantConfig())
qdrant_client = get_rag_client()
# Example operations (same API for any provider)
client = qdrant_client # or chromadb_client
client.create_collection(collection_name="docs")
client.add_documents(
collection_name="docs",
documents=[{"id": "1", "content": "CrewAI enables collaborative AI agents."}],
)
results = client.search(collection_name="docs", query="collaborative agents", limit=3)
clear_rag_config() # optional reset
基本字符串知识示例
代码
复制
询问 AI
from crewai import Agent, Task, Crew, Process, LLM
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource
# Create a knowledge source
content = "Users name is John. He is 30 years old and lives in San Francisco."
string_source = StringKnowledgeSource(content=content)
# Create an LLM with a temperature of 0 to ensure deterministic outputs
llm = LLM(model="gpt-4o-mini", temperature=0)
# Create an agent with the knowledge store
agent = Agent(
role="About User",
goal="You know everything about the user.",
backstory="You are a master at understanding people and their preferences.",
verbose=True,
allow_delegation=False,
llm=llm,
)
task = Task(
description="Answer the following questions about the user: {question}",
expected_output="An answer to the question.",
agent=agent,
)
crew = Crew(
agents=[agent],
tasks=[task],
verbose=True,
process=Process.sequential,
knowledge_sources=[string_source], # Enable knowledge by adding the sources here
)
result = crew.kickoff(inputs={"question": "What city does John live in and how old is he?"})
网页内容知识示例
您需要安装 `docling` 才能使以下示例正常工作:`uv add docling`
代码
复制
询问 AI
from crewai import LLM, Agent, Crew, Process, Task
from crewai.knowledge.source.crew_docling_source import CrewDoclingSource
# Create a knowledge source from web content
content_source = CrewDoclingSource(
file_paths=[
"https://lilianweng.github.io/posts/2024-11-28-reward-hacking",
"https://lilianweng.github.io/posts/2024-07-07-hallucination",
],
)
# Create an LLM with a temperature of 0 to ensure deterministic outputs
llm = LLM(model="gpt-4o-mini", temperature=0)
# Create an agent with the knowledge store
agent = Agent(
role="About papers",
goal="You know everything about the papers.",
backstory="You are a master at understanding papers and their content.",
verbose=True,
allow_delegation=False,
llm=llm,
)
task = Task(
description="Answer the following questions about the papers: {question}",
expected_output="An answer to the question.",
agent=agent,
)
crew = Crew(
agents=[agent],
tasks=[task],
verbose=True,
process=Process.sequential,
knowledge_sources=[content_source],
)
result = crew.kickoff(
inputs={"question": "What is the reward hacking paper about? Be sure to provide sources."}
)
支持的知识源
CrewAI 开箱即用支持多种类型的知识源文本源
- 原始字符串
- 文本文件 (.txt)
- PDF 文档
结构化数据
- CSV 文件
- Excel 电子表格
- JSON 文档
文本文件知识源
复制
询问 AI
from crewai.knowledge.source.text_file_knowledge_source import TextFileKnowledgeSource
text_source = TextFileKnowledgeSource(
file_paths=["document.txt", "another.txt"]
)
PDF 知识源
复制
询问 AI
from crewai.knowledge.source.pdf_knowledge_source import PDFKnowledgeSource
pdf_source = PDFKnowledgeSource(
file_paths=["document.pdf", "another.pdf"]
)
CSV 知识源
复制
询问 AI
from crewai.knowledge.source.csv_knowledge_source import CSVKnowledgeSource
csv_source = CSVKnowledgeSource(
file_paths=["data.csv"]
)
Excel 知识源
复制
询问 AI
from crewai.knowledge.source.excel_knowledge_source import ExcelKnowledgeSource
excel_source = ExcelKnowledgeSource(
file_paths=["spreadsheet.xlsx"]
)
JSON 知识源
复制
询问 AI
from crewai.knowledge.source.json_knowledge_source import JSONKnowledgeSource
json_source = JSONKnowledgeSource(
file_paths=["data.json"]
)
请确保创建 ./knowledge 文件夹。所有源文件(例如 .txt, .pdf, .xlsx, .json)都应放在此文件夹中以进行集中管理。
智能体 vs. 群组知识:完整指南
理解知识级别:CrewAI 支持智能体级别和群组级别的知识。本节将明确解释它们各自的工作方式、初始化时机,并解决关于依赖关系的常见误解。
知识初始化究竟是如何工作的
以下是使用知识时确切发生的事情智能体级别知识(独立)
复制
询问 AI
from crewai import Agent, Task, Crew
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource
# Agent with its own knowledge - NO crew knowledge needed
specialist_knowledge = StringKnowledgeSource(
content="Specialized technical information for this agent only"
)
specialist_agent = Agent(
role="Technical Specialist",
goal="Provide technical expertise",
backstory="Expert in specialized technical domains",
knowledge_sources=[specialist_knowledge] # Agent-specific knowledge
)
task = Task(
description="Answer technical questions",
agent=specialist_agent,
expected_output="Technical answer"
)
# No crew-level knowledge required
crew = Crew(
agents=[specialist_agent],
tasks=[task]
)
result = crew.kickoff() # Agent knowledge works independently
在 `crew.kickoff()` 期间会发生什么
当您调用 `crew.kickoff()` 时,确切的顺序如下复制
询问 AI
# During kickoff
for agent in self.agents:
agent.crew = self # Agent gets reference to crew
agent.set_knowledge(crew_embedder=self.embedder) # Agent knowledge initialized
agent.create_agent_executor()
存储独立性
每个知识级别都使用独立的存储集合复制
询问 AI
# Agent knowledge storage
agent_collection_name = agent.role # e.g., "Technical Specialist"
# Crew knowledge storage
crew_collection_name = "crew"
# Both stored in same ChromaDB instance but different collections
# Path: ~/.local/share/CrewAI/{project}/knowledge/
# ├── crew/ # Crew knowledge collection
# ├── Technical Specialist/ # Agent knowledge collection
# └── Another Agent Role/ # Another agent's collection
完整的可用示例
示例 1:仅智能体知识
复制
询问 AI
from crewai import Agent, Task, Crew
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource
# Agent-specific knowledge
agent_knowledge = StringKnowledgeSource(
content="Agent-specific information that only this agent needs"
)
agent = Agent(
role="Specialist",
goal="Use specialized knowledge",
backstory="Expert with specific knowledge",
knowledge_sources=[agent_knowledge],
embedder={ # Agent can have its own embedder
"provider": "openai",
"config": {"model": "text-embedding-3-small"}
}
)
task = Task(
description="Answer using your specialized knowledge",
agent=agent,
expected_output="Answer based on agent knowledge"
)
# No crew knowledge needed
crew = Crew(agents=[agent], tasks=[task])
result = crew.kickoff() # Works perfectly
示例 2:智能体和群组知识
复制
询问 AI
# Crew-wide knowledge (shared by all agents)
crew_knowledge = StringKnowledgeSource(
content="Company policies and general information for all agents"
)
# Agent-specific knowledge
specialist_knowledge = StringKnowledgeSource(
content="Technical specifications only the specialist needs"
)
specialist = Agent(
role="Technical Specialist",
goal="Provide technical expertise",
backstory="Technical expert",
knowledge_sources=[specialist_knowledge] # Agent-specific
)
generalist = Agent(
role="General Assistant",
goal="Provide general assistance",
backstory="General helper"
# No agent-specific knowledge
)
crew = Crew(
agents=[specialist, generalist],
tasks=[...],
knowledge_sources=[crew_knowledge] # Crew-wide knowledge
)
# Result:
# - specialist gets: crew_knowledge + specialist_knowledge
# - generalist gets: crew_knowledge only
示例 3:多个智能体拥有不同知识
复制
询问 AI
# Different knowledge for different agents
sales_knowledge = StringKnowledgeSource(content="Sales procedures and pricing")
tech_knowledge = StringKnowledgeSource(content="Technical documentation")
support_knowledge = StringKnowledgeSource(content="Support procedures")
sales_agent = Agent(
role="Sales Representative",
knowledge_sources=[sales_knowledge],
embedder={"provider": "openai", "config": {"model": "text-embedding-3-small"}}
)
tech_agent = Agent(
role="Technical Expert",
knowledge_sources=[tech_knowledge],
embedder={"provider": "ollama", "config": {"model": "mxbai-embed-large"}}
)
support_agent = Agent(
role="Support Specialist",
knowledge_sources=[support_knowledge]
# Will use crew embedder as fallback
)
crew = Crew(
agents=[sales_agent, tech_agent, support_agent],
tasks=[...],
embedder={ # Fallback embedder for agents without their own
"provider": "google",
"config": {"model": "text-embedding-004"}
}
)
# Each agent gets only their specific knowledge
# Each can use different embedding providers
与使用工具从向量数据库检索不同,预加载了知识的智能体不需要检索角色或任务。只需添加您的智能体或群组需要运行的相关知识源即可。知识源可以在智能体或群组级别添加。群组级别的知识源将被群组中的所有智能体使用。智能体级别的知识源将由预加载了该知识的特定智能体使用。
知识配置
您可以为群组或智能体配置知识。代码
复制
询问 AI
from crewai.knowledge.knowledge_config import KnowledgeConfig
knowledge_config = KnowledgeConfig(results_limit=10, score_threshold=0.5)
agent = Agent(
...
knowledge_config=knowledge_config
)
`results_limit`:返回相关文档的数量。默认为 3。`score_threshold`:一个文档被视为相关的最低分数。默认为 0.35。
支持的知识参数
提供要存储和查询内容的知识源列表。可以包括 PDF、CSV、Excel、JSON、文本文件或字符串内容。
存储知识的集合名称。用于识别不同的知识集。如果未提供,则默认为 “knowledge”。
用于管理知识如何存储和检索的自定义存储配置。如果未提供,将创建一个默认存储。
知识存储透明度
理解知识存储:CrewAI 使用 ChromaDB 进行向量存储,自动将知识源存储在特定于平台的目录中。了解这些位置和默认设置有助于生产部署、调试和存储管理。
CrewAI 在哪里存储知识文件
默认情况下,CrewAI 使用与内存相同的存储系统,将知识存储在特定于平台的目录中各平台的默认存储位置
macOS复制
询问 AI
~/Library/Application Support/CrewAI/{project_name}/
└── knowledge/ # Knowledge ChromaDB files
├── chroma.sqlite3 # ChromaDB metadata
├── {collection_id}/ # Vector embeddings
└── knowledge_{collection}/ # Named collections
复制
询问 AI
~/.local/share/CrewAI/{project_name}/
└── knowledge/
├── chroma.sqlite3
├── {collection_id}/
└── knowledge_{collection}/
复制
询问 AI
C:\Users\{username}\AppData\Local\CrewAI\{project_name}\
└── knowledge\
├── chroma.sqlite3
├── {collection_id}\
└── knowledge_{collection}\
查找您的知识存储位置
要确切查看 CrewAI 存储知识文件的位置复制
询问 AI
from crewai.utilities.paths import db_storage_path
import os
# Get the knowledge storage path
knowledge_path = os.path.join(db_storage_path(), "knowledge")
print(f"Knowledge storage location: {knowledge_path}")
# List knowledge collections and files
if os.path.exists(knowledge_path):
print("\nKnowledge storage contents:")
for item in os.listdir(knowledge_path):
item_path = os.path.join(knowledge_path, item)
if os.path.isdir(item_path):
print(f"📁 Collection: {item}/")
# Show collection contents
try:
for subitem in os.listdir(item_path):
print(f" └── {subitem}")
except PermissionError:
print(f" └── (permission denied)")
else:
print(f"📄 {item}")
else:
print("No knowledge storage found yet.")
控制知识存储位置
选项 1:环境变量(推荐)
复制
询问 AI
import os
from crewai import Crew
# Set custom storage location for all CrewAI data
os.environ["CREWAI_STORAGE_DIR"] = "./my_project_storage"
# All knowledge will now be stored in ./my_project_storage/knowledge/
crew = Crew(
agents=[...],
tasks=[...],
knowledge_sources=[...]
)
选项 2:自定义知识存储
复制
询问 AI
from crewai.knowledge.storage.knowledge_storage import KnowledgeStorage
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource
# Create custom storage with specific embedder
custom_storage = KnowledgeStorage(
embedder={
"provider": "ollama",
"config": {"model": "mxbai-embed-large"}
},
collection_name="my_custom_knowledge"
)
# Use with knowledge sources
knowledge_source = StringKnowledgeSource(
content="Your knowledge content here"
)
knowledge_source.storage = custom_storage
选项 3:项目特定的知识存储
复制
询问 AI
import os
from pathlib import Path
# Store knowledge in project directory
project_root = Path(__file__).parent
knowledge_dir = project_root / "knowledge_storage"
os.environ["CREWAI_STORAGE_DIR"] = str(knowledge_dir)
# Now all knowledge will be stored in your project directory
默认嵌入提供商行为
默认嵌入提供商:CrewAI 默认使用 OpenAI 嵌入(`text-embedding-3-small`)进行知识存储,即使在使用不同的 LLM 提供商时也是如此。您可以轻松自定义此设置以匹配您的配置。
理解默认行为
复制
询问 AI
from crewai import Agent, Crew, LLM
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource
# When using Claude as your LLM...
agent = Agent(
role="Researcher",
goal="Research topics",
backstory="Expert researcher",
llm=LLM(provider="anthropic", model="claude-3-sonnet") # Using Claude
)
# CrewAI will still use OpenAI embeddings by default for knowledge
# This ensures consistency but may not match your LLM provider preference
knowledge_source = StringKnowledgeSource(content="Research data...")
crew = Crew(
agents=[agent],
tasks=[...],
knowledge_sources=[knowledge_source]
# Default: Uses OpenAI embeddings even with Claude LLM
)
自定义知识嵌入提供商
复制
询问 AI
# Option 1: Use Voyage AI (recommended by Anthropic for Claude users)
crew = Crew(
agents=[agent],
tasks=[...],
knowledge_sources=[knowledge_source],
embedder={
"provider": "voyageai", # Recommended for Claude users
"config": {
"api_key": "your-voyage-api-key",
"model": "voyage-3" # or "voyage-3-large" for best quality
}
}
)
# Option 2: Use local embeddings (no external API calls)
crew = Crew(
agents=[agent],
tasks=[...],
knowledge_sources=[knowledge_source],
embedder={
"provider": "ollama",
"config": {
"model": "mxbai-embed-large",
"url": "https://:11434/api/embeddings"
}
}
)
# Option 3: Agent-level embedding customization
agent = Agent(
role="Researcher",
goal="Research topics",
backstory="Expert researcher",
knowledge_sources=[knowledge_source],
embedder={
"provider": "google",
"config": {
"model": "models/text-embedding-004",
"api_key": "your-google-key"
}
}
)
配置 Azure OpenAI 嵌入
当使用 Azure OpenAI 嵌入时- 请确保您首先在 Azure 平台上部署嵌入模型
- 然后您需要使用以下配置
复制
询问 AI
agent = Agent(
role="Researcher",
goal="Research topics",
backstory="Expert researcher",
knowledge_sources=[knowledge_source],
embedder={
"provider": "azure",
"config": {
"api_key": "your-azure-api-key",
"model": "text-embedding-ada-002", # change to the model you are using and is deployed in Azure
"api_base": "https://your-azure-endpoint.openai.azure.com/",
"api_version": "2024-02-01"
}
}
)
高级功能
查询重写
CrewAI 实现了一种智能查询重写机制,以优化知识检索。当一个智能体需要搜索知识源时,原始任务提示会自动转换为更有效的搜索查询。查询重写如何工作
- 当智能体执行具有可用知识源的任务时,会触发 `_get_knowledge_search_query` 方法
- 智能体的 LLM 用于将原始任务提示转换为优化的搜索查询
- 然后使用此优化查询从知识源中检索相关信息
查询重写的好处
提高检索准确性
通过关注关键概念并删除不相关内容,查询重写有助于检索更相关的信息。
上下文感知
重写的查询旨在为向量数据库检索提供更具体和上下文感知的内容。
示例
复制
询问 AI
# Original task prompt
task_prompt = "Answer the following questions about the user's favorite movies: What movie did John watch last week? Format your answer in JSON."
# Behind the scenes, this might be rewritten as:
rewritten_query = "What movies did John watch last week?"
此机制是全自动的,无需用户配置。智能体的 LLM 用于执行查询重写,因此使用功能更强大的 LLM 可以提高重写查询的质量。
知识事件
CrewAI 在知识检索过程中会发出事件,您可以使用事件系统来监听这些事件。这些事件使您能够监控、调试和分析知识是如何被您的智能体检索和使用的。可用的知识事件
- KnowledgeRetrievalStartedEvent:当智能体开始从知识源检索知识时发出
- KnowledgeRetrievalCompletedEvent:当知识检索完成时发出,包括使用的查询和检索到的内容
- KnowledgeQueryStartedEvent:当对知识源的查询开始时发出
- KnowledgeQueryCompletedEvent:当查询成功完成时发出
- KnowledgeQueryFailedEvent:当对知识源的查询失败时发出
- KnowledgeSearchQueryFailedEvent:当搜索查询失败时发出
示例:监控知识检索
复制
询问 AI
from crewai.events import (
KnowledgeRetrievalStartedEvent,
KnowledgeRetrievalCompletedEvent,
BaseEventListener,
)
class KnowledgeMonitorListener(BaseEventListener):
def setup_listeners(self, crewai_event_bus):
@crewai_event_bus.on(KnowledgeRetrievalStartedEvent)
def on_knowledge_retrieval_started(source, event):
print(f"Agent '{event.agent.role}' started retrieving knowledge")
@crewai_event_bus.on(KnowledgeRetrievalCompletedEvent)
def on_knowledge_retrieval_completed(source, event):
print(f"Agent '{event.agent.role}' completed knowledge retrieval")
print(f"Query: {event.query}")
print(f"Retrieved {len(event.retrieved_knowledge)} knowledge chunks")
# Create an instance of your listener
knowledge_monitor = KnowledgeMonitorListener()
自定义知识源
CrewAI 允许您通过扩展 `BaseKnowledgeSource` 类为任何类型的数据创建自定义知识源。让我们创建一个获取和处理太空新闻文章的实际示例。太空新闻知识源示例
复制
询问 AI
from crewai import Agent, Task, Crew, Process, LLM
from crewai.knowledge.source.base_knowledge_source import BaseKnowledgeSource
import requests
from datetime import datetime
from typing import Dict, Any
from pydantic import BaseModel, Field
class SpaceNewsKnowledgeSource(BaseKnowledgeSource):
"""Knowledge source that fetches data from Space News API."""
api_endpoint: str = Field(description="API endpoint URL")
limit: int = Field(default=10, description="Number of articles to fetch")
def load_content(self) -> Dict[Any, str]:
"""Fetch and format space news articles."""
try:
response = requests.get(
f"{self.api_endpoint}?limit={self.limit}"
)
response.raise_for_status()
data = response.json()
articles = data.get('results', [])
formatted_data = self.validate_content(articles)
return {self.api_endpoint: formatted_data}
except Exception as e:
raise ValueError(f"Failed to fetch space news: {str(e)}")
def validate_content(self, articles: list) -> str:
"""Format articles into readable text."""
formatted = "Space News Articles:\n\n"
for article in articles:
formatted += f"""
Title: {article['title']}
Published: {article['published_at']}
Summary: {article['summary']}
News Site: {article['news_site']}
URL: {article['url']}
-------------------"""
return formatted
def add(self) -> None:
"""Process and store the articles."""
content = self.load_content()
for _, text in content.items():
chunks = self._chunk_text(text)
self.chunks.extend(chunks)
self._save_documents()
# Create knowledge source
recent_news = SpaceNewsKnowledgeSource(
api_endpoint="https://api.spaceflightnewsapi.net/v4/articles",
limit=10,
)
# Create specialized agent
space_analyst = Agent(
role="Space News Analyst",
goal="Answer questions about space news accurately and comprehensively",
backstory="""You are a space industry analyst with expertise in space exploration,
satellite technology, and space industry trends. You excel at answering questions
about space news and providing detailed, accurate information.""",
knowledge_sources=[recent_news],
llm=LLM(model="gpt-4", temperature=0.0)
)
# Create task that handles user questions
analysis_task = Task(
description="Answer this question about space news: {user_question}",
expected_output="A detailed answer based on the recent space news articles",
agent=space_analyst
)
# Create and run the crew
crew = Crew(
agents=[space_analyst],
tasks=[analysis_task],
verbose=True,
process=Process.sequential
)
# Example usage
result = crew.kickoff(
inputs={"user_question": "What are the latest developments in space exploration?"}
)
调试与故障排除
调试知识问题
检查智能体知识初始化
复制
询问 AI
from crewai import Agent, Crew, Task
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource
knowledge_source = StringKnowledgeSource(content="Test knowledge")
agent = Agent(
role="Test Agent",
goal="Test knowledge",
backstory="Testing",
knowledge_sources=[knowledge_source]
)
crew = Crew(agents=[agent], tasks=[Task(...)])
# Before kickoff - knowledge not initialized
print(f"Before kickoff - Agent knowledge: {getattr(agent, 'knowledge', None)}")
crew.kickoff()
# After kickoff - knowledge initialized
print(f"After kickoff - Agent knowledge: {agent.knowledge}")
print(f"Agent knowledge collection: {agent.knowledge.storage.collection_name}")
print(f"Number of sources: {len(agent.knowledge.sources)}")
验证知识存储位置
复制
询问 AI
import os
from crewai.utilities.paths import db_storage_path
# Check storage structure
storage_path = db_storage_path()
knowledge_path = os.path.join(storage_path, "knowledge")
if os.path.exists(knowledge_path):
print("Knowledge collections found:")
for collection in os.listdir(knowledge_path):
collection_path = os.path.join(knowledge_path, collection)
if os.path.isdir(collection_path):
print(f" - {collection}/")
# Show collection contents
for item in os.listdir(collection_path):
print(f" └── {item}")
测试知识检索
复制
询问 AI
# Test agent knowledge retrieval
if hasattr(agent, 'knowledge') and agent.knowledge:
test_query = ["test query"]
results = agent.knowledge.query(test_query)
print(f"Agent knowledge results: {len(results)} documents found")
# Test crew knowledge retrieval (if exists)
if hasattr(crew, 'knowledge') and crew.knowledge:
crew_results = crew.query_knowledge(test_query)
print(f"Crew knowledge results: {len(crew_results)} documents found")
检查知识集合
复制
询问 AI
import chromadb
from crewai.utilities.paths import db_storage_path
import os
# Connect to CrewAI's knowledge ChromaDB
knowledge_path = os.path.join(db_storage_path(), "knowledge")
if os.path.exists(knowledge_path):
client = chromadb.PersistentClient(path=knowledge_path)
collections = client.list_collections()
print("Knowledge Collections:")
for collection in collections:
print(f" - {collection.name}: {collection.count()} documents")
# Sample a few documents to verify content
if collection.count() > 0:
sample = collection.peek(limit=2)
print(f" Sample content: {sample['documents'][0][:100]}...")
else:
print("No knowledge storage found")
检查知识处理
复制
询问 AI
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource
# Create a test knowledge source
test_source = StringKnowledgeSource(
content="Test knowledge content for debugging",
chunk_size=100, # Small chunks for testing
chunk_overlap=20
)
# Check chunking behavior
print(f"Original content length: {len(test_source.content)}")
print(f"Chunk size: {test_source.chunk_size}")
print(f"Chunk overlap: {test_source.chunk_overlap}")
# Process and inspect chunks
test_source.add()
print(f"Number of chunks created: {len(test_source.chunks)}")
for i, chunk in enumerate(test_source.chunks[:3]): # Show first 3 chunks
print(f"Chunk {i+1}: {chunk[:50]}...")
常见的知识存储问题
“文件未找到”错误复制
询问 AI
# Ensure files are in the correct location
from crewai.utilities.constants import KNOWLEDGE_DIRECTORY
import os
knowledge_dir = KNOWLEDGE_DIRECTORY # Usually "knowledge"
file_path = os.path.join(knowledge_dir, "your_file.pdf")
if not os.path.exists(file_path):
print(f"File not found: {file_path}")
print(f"Current working directory: {os.getcwd()}")
print(f"Expected knowledge directory: {os.path.abspath(knowledge_dir)}")
复制
询问 AI
# This happens when switching embedding providers
# Reset knowledge storage to clear old embeddings
crew.reset_memories(command_type='knowledge')
# Or use consistent embedding providers
crew = Crew(
agents=[...],
tasks=[...],
knowledge_sources=[...],
embedder={"provider": "openai", "config": {"model": "text-embedding-3-small"}}
)
复制
询问 AI
# Fix storage permissions
chmod -R 755 ~/.local/share/CrewAI/
复制
询问 AI
# Verify storage location consistency
import os
from crewai.utilities.paths import db_storage_path
print("CREWAI_STORAGE_DIR:", os.getenv("CREWAI_STORAGE_DIR"))
print("Computed storage path:", db_storage_path())
print("Knowledge path:", os.path.join(db_storage_path(), "knowledge"))
知识重置命令
复制
询问 AI
# Reset only agent-specific knowledge
crew.reset_memories(command_type='agent_knowledge')
# Reset both crew and agent knowledge
crew.reset_memories(command_type='knowledge')
# CLI commands
# crewai reset-memories --agent-knowledge # Agent knowledge only
# crewai reset-memories --knowledge # All knowledge
清除知识
如果您需要清除 CrewAI 中存储的知识,可以使用带有 `--knowledge` 选项的 `crewai reset-memories` 命令。命令
复制
询问 AI
crewai reset-memories --knowledge
最佳实践
内容组织
内容组织
- 保持块大小适合您的内容类型
- 考虑内容重叠以保持上下文
- 将相关信息组织到单独的知识源中
性能提示
性能提示
- 根据内容复杂性调整块大小
- 配置适当的嵌入模型
- 考虑使用本地嵌入提供商以加快处理速度
一次性知识
一次性知识
- 使用 CrewAI 提供的典型文件结构,每次触发 kickoff 时都会嵌入知识源。
- 如果知识源很大,这会导致效率低下和延迟增加,因为每次都会嵌入相同的数据。
- 为了解决这个问题,直接初始化 knowledge 参数而不是 knowledge_sources 参数。
- 链接到问题以获得完整概念 Github 问题
知识管理
知识管理
- 为角色特定信息使用智能体级别的知识
- 为所有智能体需要的共享信息使用群组级别的知识
- 如果需要不同的嵌入策略,请在智能体级别设置嵌入器
- 通过保持智能体角色的描述性来使用一致的集合命名
- 通过在 kickoff 后检查 agent.knowledge 来测试知识初始化
- 监控存储位置以了解知识存储在哪里
- 使用正确的命令类型适当地重置知识
生产最佳实践
生产最佳实践
- 在生产中将 `CREWAI_STORAGE_DIR` 设置为已知位置
- 选择明确的嵌入提供商以匹配您的 LLM 设置并避免 API 密钥冲突
- 监控知识存储大小,因为它会随着文档的增加而增长
- 使用集合名称按领域或目的组织知识源
- 在您的备份和部署策略中包括知识目录
- 为知识文件和存储目录设置适当的文件权限
- 为 API 密钥和敏感配置使用环境变量
