使用多模态代理 - CrewAI 框架

使用多模态智能体

CrewAI 支持多模态代理，可以同时处理文本和图像等非文本内容。本指南将向您展示如何启用和使用代理的多模态功能。

启用多模态功能

要创建多模态代理，只需在初始化代理时将 multimodal 参数设置为 True

from crewai import Agent

agent = Agent(
    role="Image Analyst",
    goal="Analyze and extract insights from images",
    backstory="An expert in visual content interpretation with years of experience in image analysis",
    multimodal=True  # This enables multimodal capabilities
)

当您设置 multimodal=True 时，代理会自动配置处理非文本内容所需的工具，包括 AddImageTool。

使用图像

多模态代理预配置了 AddImageTool，使其能够处理图像。您无需手动添加此工具——在启用多模态功能时会自动包含它。这是一个完整的示例，展示了如何使用多模态代理来分析图像：

from crewai import Agent, Task, Crew

# Create a multimodal agent
image_analyst = Agent(
    role="Product Analyst",
    goal="Analyze product images and provide detailed descriptions",
    backstory="Expert in visual product analysis with deep knowledge of design and features",
    multimodal=True
)

# Create a task for image analysis
task = Task(
    description="Analyze the product image at https://example.com/product.jpg and provide a detailed description",
    expected_output="A detailed description of the product image",
    agent=image_analyst
)

# Create and run the crew
crew = Crew(
    agents=[image_analyst],
    tasks=[task]
)

result = crew.kickoff()

高级用法与上下文

在为多模态代理创建任务时，您可以提供关于图像的额外上下文或具体问题。任务描述可以包含您希望代理关注的特定方面

from crewai import Agent, Task, Crew

# Create a multimodal agent for detailed analysis
expert_analyst = Agent(
    role="Visual Quality Inspector",
    goal="Perform detailed quality analysis of product images",
    backstory="Senior quality control expert with expertise in visual inspection",
    multimodal=True  # AddImageTool is automatically included
)

# Create a task with specific analysis requirements
inspection_task = Task(
    description="""
    Analyze the product image at https://example.com/product.jpg with focus on:
    1. Quality of materials
    2. Manufacturing defects
    3. Compliance with standards
    Provide a detailed report highlighting any issues found.
    """,
    expected_output="A detailed report highlighting any issues found",
    agent=expert_analyst
)

# Create and run the crew
crew = Crew(
    agents=[expert_analyst],
    tasks=[inspection_task]
)

result = crew.kickoff()

工具详情

使用多模态代理时，AddImageTool 会自动配置以下架构

class AddImageToolSchema:
    image_url: str  # Required: The URL or path of the image to process
    action: Optional[str] = None  # Optional: Additional context or specific questions about the image

多模态代理将通过其内置工具自动处理图像，使其能够：

通过 URL 或本地文件路径访问图像
处理图像内容，可选择提供上下文或具体问题
根据视觉信息和任务要求提供分析和见解

最佳实践

使用多模态代理时，请记住以下最佳实践：

图像访问
- 确保您的图像可通过代理可访问的 URL 获取
- 对于本地图像，请考虑临时托管或使用绝对文件路径
- 在运行任务之前，验证图像 URL 是否有效且可访问
任务描述
- 明确说明您希望代理分析图像的哪些方面
- 在任务描述中包含清晰的问题或要求
- 考虑使用可选的 action 参数进行重点分析
资源管理
- 图像处理可能比纯文本任务需要更多的计算资源
- 某些语言模型可能需要对图像数据进行 Base64 编码
- 考虑对多个图像进行批量处理以优化性能
环境设置
- 验证您的环境是否具有图像处理所需的依赖项
- 确保您的语言模型支持多模态功能
- 首先使用小图像进行测试以验证您的设置
错误处理
- 实施适当的错误处理以应对图像加载失败
- 制定图像处理失败时的备用策略
- 监控和记录图像处理操作以进行调试

开始使用

指南

核心概念

MCP 集成

工具

可观测性

学习

遥测

使用多模态智能体

使用多模态智能体

启用多模态功能

使用图像

高级用法与上下文

工具详情

最佳实践

开始使用

指南

核心概念

MCP 集成

工具

可观测性

学习

遥测

​使用多模态智能体

​启用多模态功能

​使用图像

​高级用法与上下文

​工具详情

​最佳实践

使用多模态智能体

启用多模态功能

使用图像

高级用法与上下文

工具详情

最佳实践