Tavily 提取器工具 - CrewAI 框架

TavilyExtractorTool 允许 CrewAI 代理使用 Tavily API 从网页中提取结构化内容。它可以处理单个 URL 或 URL 列表，并提供控制提取深度和包含图像的选项。

安装

要使用 TavilyExtractorTool，您需要安装 tavily-python 库

pip install 'crewai[tools]' tavily-python

您还需要将 Tavily API 密钥设置为环境变量

export TAVILY_API_KEY='your-tavily-api-key'

用法示例

以下是如何在 CrewAI 代理中初始化和使用 TavilyExtractorTool

import os
from crewai import Agent, Task, Crew
from crewai_tools import TavilyExtractorTool

# Ensure TAVILY_API_KEY is set in your environment
# os.environ["TAVILY_API_KEY"] = "YOUR_API_KEY"

# Initialize the tool
tavily_tool = TavilyExtractorTool()

# Create an agent that uses the tool
extractor_agent = Agent(
    role='Web Content Extractor',
    goal='Extract key information from specified web pages',
    backstory='You are an expert at extracting relevant content from websites using the Tavily API.',
    tools=[tavily_tool],
    verbose=True
)

# Define a task for the agent
extract_task = Task(
    description='Extract the main content from the URL https://example.com using basic extraction depth.',
    expected_output='A JSON string containing the extracted content from the URL.',
    agent=extractor_agent
)

# Create and run the crew
crew = Crew(
    agents=[extractor_agent],
    tasks=[extract_task],
    verbose=2
)

result = crew.kickoff()
print(result)

配置选项

TavilyExtractorTool 接受以下参数

urls (Union[List[str], str])：必需。一个 URL 字符串或一个 URL 字符串列表，用于提取数据。
include_images (Optional[bool])：是否在提取结果中包含图像。默认为 False。
extract_depth (Literal[“basic”, “advanced”])：提取的深度。使用 "basic" 进行更快、表面级的提取，或使用 "advanced" 进行更全面的提取。默认为 "basic"。
timeout (int)：等待提取请求完成的最大时间（秒）。默认为 60。

高级用法

使用高级提取处理多个 URL

# Example with multiple URLs and advanced extraction
multi_extract_task = Task(
    description='Extract content from https://example.com and https://anotherexample.org using advanced extraction.',
    expected_output='A JSON string containing the extracted content from both URLs.',
    agent=extractor_agent
)

# Configure the tool with custom parameters
custom_extractor = TavilyExtractorTool(
    extract_depth='advanced',
    include_images=True,
    timeout=120
)

agent_with_custom_tool = Agent(
    role="Advanced Content Extractor",
    goal="Extract comprehensive content with images",
    tools=[custom_extractor]
)

工具参数

您可以通过在初始化期间设置参数来自定义工具的行为

# Initialize with custom configuration
extractor_tool = TavilyExtractorTool(
    extract_depth='advanced',  # More comprehensive extraction
    include_images=True,       # Include image results
    timeout=90                 # Custom timeout
)

功能

单个或多个 URL：从一个 URL 中提取内容或在一个请求中处理多个 URL
可配置深度：在基本（快速）和高级（全面）提取模式之间选择
图像支持：可选地在提取结果中包含图像
结构化输出：返回包含提取内容的格式良好的 JSON
错误处理：对网络超时和提取错误进行鲁棒处理

响应格式

该工具返回一个 JSON 字符串，表示从提供的 URL 中提取的结构化数据。确切的结构取决于页面的内容和使用的 extract_depth。常见的响应元素包括：

标题：页面标题
内容：页面主要文本内容
图像：图像 URL 和元数据（当 include_images=True 时）
元数据：其他页面信息，如作者、描述等

用例

内容分析：从竞争对手网站提取和分析内容
研究：从多个来源收集结构化数据进行分析
内容迁移：从现有网站提取内容进行迁移
监控：定期提取内容以进行更改检测
数据收集：系统地从网络源提取信息

有关响应结构和可用选项的详细信息，请参阅 Tavily API 文档。

开始使用

指南

核心概念

MCP 集成

工具

可观测性

学习

遥测

Tavily 提取工具

安装

用法示例

配置选项

高级用法

使用高级提取处理多个 URL

工具参数

功能

响应格式

用例

开始使用

指南

核心概念

MCP 集成

工具

可观测性

学习

遥测

​安装

​用法示例

​配置选项

​高级用法

​使用高级提取处理多个 URL

​工具参数

​功能

​响应格式

​用例

安装

用法示例

配置选项

高级用法

使用高级提取处理多个 URL

工具参数

功能

响应格式

用例