`从网站抓取元素工具 (ScrapeElementFromWebsiteTool)`

描述

ScrapeElementFromWebsiteTool 设计用于使用 CSS 选择器从网站中提取特定元素。此工具允许 CrewAI 代理从网页中抓取目标内容，使其在仅需要网页特定部分的数据提取任务中非常有用。

安装

要使用此工具，您需要安装所需的依赖项

uv add requests beautifulsoup4

开始步骤

要有效使用 ScrapeElementFromWebsiteTool，请遵循以下步骤：

安装依赖项：使用上述命令安装所需的包。
识别 CSS 选择器：确定您要从网站中提取的元素的 CSS 选择器。
初始化工具：使用必要的参数创建工具的实例。

示例

以下示例演示了如何使用 ScrapeElementFromWebsiteTool 从网站中提取特定元素：

代码

from crewai import Agent, Task, Crew
from crewai_tools import ScrapeElementFromWebsiteTool

# Initialize the tool
scrape_tool = ScrapeElementFromWebsiteTool()

# Define an agent that uses the tool
web_scraper_agent = Agent(
    role="Web Scraper",
    goal="Extract specific information from websites",
    backstory="An expert in web scraping who can extract targeted content from web pages.",
    tools=[scrape_tool],
    verbose=True,
)

# Example task to extract headlines from a news website
scrape_task = Task(
    description="Extract the main headlines from the CNN homepage. Use the CSS selector '.headline' to target the headline elements.",
    expected_output="A list of the main headlines from CNN.",
    agent=web_scraper_agent,
)

# Create and run the crew
crew = Crew(agents=[web_scraper_agent], tasks=[scrape_task])
result = crew.kickoff()

您也可以使用预定义的参数来初始化该工具。

代码

# Initialize the tool with predefined parameters
scrape_tool = ScrapeElementFromWebsiteTool(
    website_url="https://www.example.com",
    css_element=".main-content"
)

参数

ScrapeElementFromWebsiteTool 在初始化时接受以下参数：

website_url: 可选。要抓取的网站 URL。如果在初始化时提供，代理在使用该工具时将无需指定它。
css_element：可选。用于提取元素的 CSS 选择器。如果在初始化时提供，代理在使用工具时将无需指定它。
cookies：可选。一个包含要随请求发送的 cookie 的字典。这对于需要身份验证的网站很有用。

用法

在代理中使用 ScrapeElementFromWebsiteTool 时，代理需要提供以下参数（除非在初始化时已指定）：

website_url：要抓取的网站的 URL。
css_element：用于提取元素的 CSS 选择器。

该工具将返回与 CSS 选择器匹配的所有元素的文本内容，并用换行符连接。

代码

# Example of using the tool with an agent
web_scraper_agent = Agent(
    role="Web Scraper",
    goal="Extract specific elements from websites",
    backstory="An expert in web scraping who can extract targeted content using CSS selectors.",
    tools=[scrape_tool],
    verbose=True,
)

# Create a task for the agent to extract specific elements
extract_task = Task(
    description="""
    Extract all product titles from the featured products section on example.com.
    Use the CSS selector '.product-title' to target the title elements.
    """,
    expected_output="A list of product titles from the website",
    agent=web_scraper_agent,
)

# Run the task through a crew
crew = Crew(agents=[web_scraper_agent], tasks=[extract_task])
result = crew.kickoff()

实现细节

ScrapeElementFromWebsiteTool 使用 requests 库获取网页，并使用 BeautifulSoup 解析 HTML 并提取指定的元素。

代码

class ScrapeElementFromWebsiteTool(BaseTool):
    name: str = "Read a website content"
    description: str = "A tool that can be used to read a website content."
    
    # Implementation details...
    
    def _run(self, **kwargs: Any) -> Any:
        website_url = kwargs.get("website_url", self.website_url)
        css_element = kwargs.get("css_element", self.css_element)
        page = requests.get(
            website_url,
            headers=self.headers,
            cookies=self.cookies if self.cookies else {},
        )
        parsed = BeautifulSoup(page.content, "html.parser")
        elements = parsed.select(css_element)
        return "\n".join([element.get_text() for element in elements])

结论

ScrapeElementFromWebsiteTool 提供了一种使用 CSS 选择器从网站中提取特定元素的强大方法。通过使代理能够仅针对所需内容，它使网络抓取任务更加高效和专注。此工具在需要从网页中提取特定信息的数据提取、内容监控和研究任务中特别有用。

开始使用

指南

核心概念

MCP 集成

工具

可观测性

学习

遥测

从网站抓取元素的工具

`从网站抓取元素工具 (ScrapeElementFromWebsiteTool)`

描述

安装

开始步骤

示例

参数

用法

实现细节

结论

开始使用

指南

核心概念

MCP 集成

工具

可观测性

学习

遥测

​从网站抓取元素工具 (ScrapeElementFromWebsiteTool)

​描述

​安装

​开始步骤

​示例

​参数

​用法

​实现细节

​结论

`从网站抓取元素工具 (ScrapeElementFromWebsiteTool)`

描述

安装

开始步骤

示例

参数

用法

实现细节

结论