使用多模态智能体

CrewAI 支持可以处理文本和图像等非文本内容的多模态智能体。本指南将向您展示如何在智能体中启用和使用多模态功能。

启用多模态功能

要创建一个多模态智能体，只需在初始化智能体时将 multimodal 参数设置为 True

from crewai import Agent

agent = Agent(
    role="Image Analyst",
    goal="Analyze and extract insights from images",
    backstory="An expert in visual content interpretation with years of experience in image analysis",
    multimodal=True  # This enables multimodal capabilities
)

当您设置 multimodal=True 时，智能体将自动配置处理非文本内容所需的工具，包括 AddImageTool。

处理图像

多模态智能体预配置了 AddImageTool，使其能够处理图像。您无需手动添加此工具 - 在您启用多模态功能时它会自动包含。

以下是展示如何使用多模态智能体分析图像的完整示例

from crewai import Agent, Task, Crew

# Create a multimodal agent
image_analyst = Agent(
    role="Product Analyst",
    goal="Analyze product images and provide detailed descriptions",
    backstory="Expert in visual product analysis with deep knowledge of design and features",
    multimodal=True
)

# Create a task for image analysis
task = Task(
    description="Analyze the product image at https://example.com/product.jpg and provide a detailed description",
    expected_output="A detailed description of the product image",
    agent=image_analyst
)

# Create and run the crew
crew = Crew(
    agents=[image_analyst],
    tasks=[task]
)

result = crew.kickoff()

结合上下文的高级用法

在为多模态智能体创建任务时，您可以提供额外的上下文或关于图像的具体问题。任务描述可以包含您希望智能体关注的特定方面

from crewai import Agent, Task, Crew

# Create a multimodal agent for detailed analysis
expert_analyst = Agent(
    role="Visual Quality Inspector",
    goal="Perform detailed quality analysis of product images",
    backstory="Senior quality control expert with expertise in visual inspection",
    multimodal=True  # AddImageTool is automatically included
)

# Create a task with specific analysis requirements
inspection_task = Task(
    description="""
    Analyze the product image at https://example.com/product.jpg with focus on:
    1. Quality of materials
    2. Manufacturing defects
    3. Compliance with standards
    Provide a detailed report highlighting any issues found.
    """,
    expected_output="A detailed report highlighting any issues found",
    agent=expert_analyst
)

# Create and run the crew
crew = Crew(
    agents=[expert_analyst],
    tasks=[inspection_task]
)

result = crew.kickoff()

工具详情

在使用多模态智能体时，AddImageTool 会自动配置以下架构

class AddImageToolSchema:
    image_url: str  # Required: The URL or path of the image to process
    action: Optional[str] = None  # Optional: Additional context or specific questions about the image

多模态智能体将通过其内置工具自动处理图像，使其能够

通过 URL 或本地文件路径访问图像
处理图像内容，可选择提供上下文或具体问题
根据视觉信息和任务要求提供分析和见解

最佳实践

在使用多模态智能体时，请记住以下最佳实践

图像访问
- 确保您的图像可通过智能体可访问的 URL 访问
- 对于本地图像，考虑临时托管它们或使用绝对文件路径
- 在运行任务前验证图像 URL 是否有效且可访问
任务描述
- 具体说明您希望智能体分析图像的哪些方面
- 在任务描述中包含清晰的问题或要求
- 考虑使用可选的 action 参数进行重点分析
资源管理
- 图像处理可能比纯文本任务需要更多的计算资源
- 某些语言模型可能需要对图像数据进行 base64 编码
- 考虑对多张图像进行批量处理以优化性能
环境设置
- 验证您的环境是否具备图像处理所需的依赖项
- 确保您的语言模型支持多模态功能
- 首先使用小图像进行测试以验证您的设置
错误处理
- 实现适当的错误处理以应对图像加载失败
- 制定图像处理失败时的回退策略
- 监控并记录图像处理操作以便调试

连接到任何 LLM 重放最新团队启动的任务

本页内容

使用多模态智能体
启用多模态功能
处理图像
结合上下文的高级用法
工具详情
最佳实践

开始入门

指南

核心概念

工具

智能体监控与可观测性

学习

遥测

使用多模态智能体

使用多模态智能体

启用多模态功能

处理图像

结合上下文的高级用法

工具详情

最佳实践

开始入门

指南

核心概念

工具

智能体监控与可观测性

学习

遥测

​使用多模态智能体

​启用多模态功能

​处理图像

​结合上下文的高级用法

​工具详情

​最佳实践

使用多模态智能体

启用多模态功能

处理图像

结合上下文的高级用法

工具详情

最佳实践