`ScrapeElementFromWebsiteTool`

描述

ScrapeElementFromWebsiteTool 工具旨在利用 CSS 选择器从网站中提取特定元素。该工具允许 CrewAI 智能体从网页中抓取目标内容，这对于只需要网页特定部分的数据提取任务非常有用。

安装

要使用此工具，您需要安装所需的依赖项

uv add requests beautifulsoup4

入门步骤

要有效使用 ScrapeElementFromWebsiteTool 工具，请按照以下步骤操作

安装依赖项：使用上述命令安装所需的包。
确定 CSS 选择器：确定要从网站中提取元素的 CSS 选择器。
初始化工具：使用必要的参数创建工具实例。

示例

以下示例演示如何使用 ScrapeElementFromWebsiteTool 工具从网站中提取特定元素

代码
from crewai import Agent, Task, Crew
from crewai_tools import ScrapeElementFromWebsiteTool

# Initialize the tool
scrape_tool = ScrapeElementFromWebsiteTool()

# Define an agent that uses the tool
web_scraper_agent = Agent(
    role="Web Scraper",
    goal="Extract specific information from websites",
    backstory="An expert in web scraping who can extract targeted content from web pages.",
    tools=[scrape_tool],
    verbose=True,
)

# Example task to extract headlines from a news website
scrape_task = Task(
    description="Extract the main headlines from the CNN homepage. Use the CSS selector '.headline' to target the headline elements.",
    expected_output="A list of the main headlines from CNN.",
    agent=web_scraper_agent,
)

# Create and run the crew
crew = Crew(agents=[web_scraper_agent], tasks=[scrape_task])
result = crew.kickoff()

您还可以使用预定义参数初始化工具

代码
# Initialize the tool with predefined parameters
scrape_tool = ScrapeElementFromWebsiteTool(
    website_url="https://www.example.com",
    css_element=".main-content"
)

参数

ScrapeElementFromWebsiteTool 工具在初始化时接受以下参数

website_url：可选。要抓取的网站 URL。如果在初始化时提供，智能体在使用该工具时无需再次指定。
css_element：可选。要提取元素的 CSS 选择器。如果在初始化时提供，智能体在使用该工具时无需再次指定。
cookies：可选。包含要与请求一起发送的 cookie 的字典。对于需要身份验证的网站非常有用。

用法

当智能体使用 ScrapeElementFromWebsiteTool 工具时，智能体需要提供以下参数（除非在初始化时已指定）

website_url：要抓取的网站 URL。
css_element：要提取元素的 CSS 选择器。

该工具将返回所有匹配 CSS 选择器的元素的文本内容，并用换行符连接。

代码
# Example of using the tool with an agent
web_scraper_agent = Agent(
    role="Web Scraper",
    goal="Extract specific elements from websites",
    backstory="An expert in web scraping who can extract targeted content using CSS selectors.",
    tools=[scrape_tool],
    verbose=True,
)

# Create a task for the agent to extract specific elements
extract_task = Task(
    description="""
    Extract all product titles from the featured products section on example.com.
    Use the CSS selector '.product-title' to target the title elements.
    """,
    expected_output="A list of product titles from the website",
    agent=web_scraper_agent,
)

# Run the task through a crew
crew = Crew(agents=[web_scraper_agent], tasks=[extract_task])
result = crew.kickoff()

实现细节

ScrapeElementFromWebsiteTool 工具使用 requests 库获取网页，使用 BeautifulSoup 解析 HTML 并提取指定元素

代码
class ScrapeElementFromWebsiteTool(BaseTool):
    name: str = "Read a website content"
    description: str = "A tool that can be used to read a website content."
    
    # Implementation details...
    
    def _run(self, **kwargs: Any) -> Any:
        website_url = kwargs.get("website_url", self.website_url)
        css_element = kwargs.get("css_element", self.css_element)
        page = requests.get(
            website_url,
            headers=self.headers,
            cookies=self.cookies if self.cookies else {},
        )
        parsed = BeautifulSoup(page.content, "html.parser")
        elements = parsed.select(css_element)
        return "\n".join([element.get_text() for element in elements])

结论

ScrapeElementFromWebsiteTool 工具提供了一种强大的方法，可以使用 CSS 选择器从网站中提取特定元素。通过使智能体能够只针对所需内容，它使网页抓取任务更加高效和集中。此工具对于需要从网页中提取特定信息的数据提取、内容监控和研究任务特别有用。

Scrapegraph 抓取工具 JSON RAG 搜索

本页内容

ScrapeElementFromWebsiteTool
描述
安装
入门步骤
示例
参数
用法
实现细节
结论

入门

指南

核心概念

工具

智能体监控与可观测性

学习

遥测

从网站抓取元素工具

`ScrapeElementFromWebsiteTool`

描述

安装

入门步骤

示例

参数

用法

实现细节

结论

入门

指南

核心概念

工具

智能体监控与可观测性

学习

遥测

​ScrapeElementFromWebsiteTool

​描述

​安装

​入门步骤

​示例

​参数

​用法

​实现细节

​结论

`ScrapeElementFromWebsiteTool`

描述

安装

入门步骤

示例

参数

用法

实现细节

结论