PDFSearchTool
我们仍在努力改进工具,因此未来可能会出现意外行为或变更。
描述
PDFSearchTool 是一个为 PDF 内容中的语义搜索而设计的 RAG 工具。它允许输入搜索查询和 PDF 文档,利用高级搜索技术有效地查找相关内容。此功能使其在从大型 PDF 文件中快速提取特定信息方面特别有用。安装
要开始使用 PDFSearchTool,首先请确保已安装 crewai_tools 包,使用以下命令:复制
询问 AI
pip install 'crewai[tools]'
示例
以下是如何使用 PDFSearchTool 在 PDF 文档中进行搜索:代码
复制
询问 AI
from crewai_tools import PDFSearchTool
# Initialize the tool allowing for any PDF content search if the path is provided during execution
tool = PDFSearchTool()
# OR
# Initialize the tool with a specific PDF path for exclusive search within that document
tool = PDFSearchTool(pdf='path/to/your/document.pdf')
参数
pdf:可选 搜索的 PDF 路径。可以在初始化时提供,也可以在run方法的参数中提供。如果在初始化时提供,则工具将搜索范围限制为指定文档。
自定义模型和嵌入
默认情况下,该工具使用 OpenAI 进行嵌入和摘要。要自定义模型,可以使用如下配置字典。注意:需要一个向量数据库,因为生成的嵌入必须存储在向量数据库中并从中查询。代码
复制
询问 AI
from crewai_tools import PDFSearchTool
# - embedding_model (required): choose provider + provider-specific config
# - vectordb (required): choose vector DB and pass its config
tool = PDFSearchTool(
config={
"embedding_model": {
# Supported providers: "openai", "azure", "google-generativeai", "google-vertex",
# "voyageai", "cohere", "huggingface", "jina", "sentence-transformer",
# "text2vec", "ollama", "openclip", "instructor", "onnx", "roboflow", "watsonx", "custom"
"provider": "openai", # or: "google-generativeai", "cohere", "ollama", ...
"config": {
# Model identifier for the chosen provider. "model" will be auto-mapped to "model_name" internally.
"model": "text-embedding-3-small",
# Optional: API key. If omitted, the tool will use provider-specific env vars
# (e.g., OPENAI_API_KEY or EMBEDDINGS_OPENAI_API_KEY for OpenAI).
# "api_key": "sk-...",
# Provider-specific examples:
# --- Google Generative AI ---
# (Set provider="google-generativeai" above)
# "model_name": "gemini-embedding-001",
# "task_type": "RETRIEVAL_DOCUMENT",
# "title": "Embeddings",
# --- Cohere ---
# (Set provider="cohere" above)
# "model": "embed-english-v3.0",
# --- Ollama (local) ---
# (Set provider="ollama" above)
# "model": "nomic-embed-text",
},
},
"vectordb": {
"provider": "chromadb", # or "qdrant"
"config": {
# For ChromaDB: pass "settings" (chromadb.config.Settings) or rely on defaults.
# Example (uncomment and import):
# from chromadb.config import Settings
# "settings": Settings(
# persist_directory="/content/chroma",
# allow_reset=True,
# is_persistent=True,
# ),
# For Qdrant: pass "vectors_config" (qdrant_client.models.VectorParams).
# Example (uncomment and import):
# from qdrant_client.models import VectorParams, Distance
# "vectors_config": VectorParams(size=384, distance=Distance.COSINE),
# Note: collection name is controlled by the tool (default: "rag_tool_collection"), not set here.
}
},
}
)
