Patronus AI 评估

概述

Patronus AI 为 CrewAI 代理提供全面的评估和监控功能，使您能够评估模型输出、代理行为和整体系统性能。这种集成使您能够实施持续评估工作流程，帮助在生产环境中保持质量和可靠性。

主要特点

自动化评估：实时评估代理输出和行为
自定义标准：定义针对您的用例量身定制的特定评估标准
性能监控：随时间跟踪代理性能指标
质量保证：确保在不同场景下输出质量的一致性
安全与合规：监控潜在问题和政策违规

评估工具

Patronus 为不同的用例提供了三个主要的评估工具

PatronusEvalTool：允许代理为评估任务选择最合适的评估器和标准。
PatronusPredefinedCriteriaEvalTool：使用用户指定的预定义评估器和标准。
PatronusLocalEvaluatorTool：使用用户定义的自定义函数评估器。

安装

要使用这些工具，您需要安装 Patronus 软件包

uv add patronus

您还需要将 Patronus API 密钥设置为环境变量

export PATRONUS_API_KEY="your_patronus_api_key"

开始步骤

要有效使用 Patronus 评估工具，请遵循以下步骤

安装 Patronus：使用上述命令安装 Patronus 软件包。
设置 API 密钥：将您的 Patronus API 密钥设置为环境变量。
选择正确的工具：根据您的需求选择合适的 Patronus 评估工具。
配置工具：使用必要的参数配置工具。

示例

使用 PatronusEvalTool

以下示例演示了如何使用 PatronusEvalTool，它允许代理选择最合适的评估器和标准

代码

from crewai import Agent, Task, Crew
from crewai_tools import PatronusEvalTool

# Initialize the tool
patronus_eval_tool = PatronusEvalTool()

# Define an agent that uses the tool
coding_agent = Agent(
    role="Coding Agent",
    goal="Generate high quality code and verify that the output is code",
    backstory="An experienced coder who can generate high quality python code.",
    tools=[patronus_eval_tool],
    verbose=True,
)

# Example task to generate and evaluate code
generate_code_task = Task(
    description="Create a simple program to generate the first N numbers in the Fibonacci sequence. Select the most appropriate evaluator and criteria for evaluating your output.",
    expected_output="Program that generates the first N numbers in the Fibonacci sequence.",
    agent=coding_agent,
)

# Create and run the crew
crew = Crew(agents=[coding_agent], tasks=[generate_code_task])
result = crew.kickoff()

使用 PatronusPredefinedCriteriaEvalTool

以下示例演示了如何使用 PatronusPredefinedCriteriaEvalTool，它使用预定义的评估器和标准

代码

from crewai import Agent, Task, Crew
from crewai_tools import PatronusPredefinedCriteriaEvalTool

# Initialize the tool with predefined criteria
patronus_eval_tool = PatronusPredefinedCriteriaEvalTool(
    evaluators=[{"evaluator": "judge", "criteria": "contains-code"}]
)

# Define an agent that uses the tool
coding_agent = Agent(
    role="Coding Agent",
    goal="Generate high quality code",
    backstory="An experienced coder who can generate high quality python code.",
    tools=[patronus_eval_tool],
    verbose=True,
)

# Example task to generate code
generate_code_task = Task(
    description="Create a simple program to generate the first N numbers in the Fibonacci sequence.",
    expected_output="Program that generates the first N numbers in the Fibonacci sequence.",
    agent=coding_agent,
)

# Create and run the crew
crew = Crew(agents=[coding_agent], tasks=[generate_code_task])
result = crew.kickoff()

使用 PatronusLocalEvaluatorTool

以下示例演示了如何使用 PatronusLocalEvaluatorTool，它使用自定义函数评估器

代码

from crewai import Agent, Task, Crew
from crewai_tools import PatronusLocalEvaluatorTool
from patronus import Client, EvaluationResult
import random

# Initialize the Patronus client
client = Client()

# Register a custom evaluator
@client.register_local_evaluator("random_evaluator")
def random_evaluator(**kwargs):
    score = random.random()
    return EvaluationResult(
        score_raw=score,
        pass_=score >= 0.5,
        explanation="example explanation",
    )

# Initialize the tool with the custom evaluator
patronus_eval_tool = PatronusLocalEvaluatorTool(
    patronus_client=client,
    evaluator="random_evaluator",
    evaluated_model_gold_answer="example label",
)

# Define an agent that uses the tool
coding_agent = Agent(
    role="Coding Agent",
    goal="Generate high quality code",
    backstory="An experienced coder who can generate high quality python code.",
    tools=[patronus_eval_tool],
    verbose=True,
)

# Example task to generate code
generate_code_task = Task(
    description="Create a simple program to generate the first N numbers in the Fibonacci sequence.",
    expected_output="Program that generates the first N numbers in the Fibonacci sequence.",
    agent=coding_agent,
)

# Create and run the crew
crew = Crew(agents=[coding_agent], tasks=[generate_code_task])
result = crew.kickoff()

参数

PatronusEvalTool

PatronusEvalTool 在初始化期间不需要任何参数。它会自动从 Patronus API 获取可用的评估器和标准。

PatronusPredefinedCriteriaEvalTool

PatronusPredefinedCriteriaEvalTool 在初始化期间接受以下参数

evaluators：必需。包含要使用的评估器和标准的字典列表。例如：[{"evaluator": "judge", "criteria": "contains-code"}]。

PatronusLocalEvaluatorTool

PatronusLocalEvaluatorTool 在初始化期间接受以下参数

patronus_client：必需。Patronus 客户端实例。
evaluator：可选。要使用的已注册本地评估器的名称。默认为空字符串。
evaluated_model_gold_answer：可选。用于评估的黄金答案。默认为空字符串。

用法

使用 Patronus 评估工具时，您提供模型输入、输出和上下文，工具会从 Patronus API 返回评估结果。对于 PatronusEvalTool 和 PatronusPredefinedCriteriaEvalTool，调用工具时需要以下参数：

evaluated_model_input：代理以纯文本形式的任务描述。
evaluated_model_output：代理的任务输出。
evaluated_model_retrieved_context：代理的上下文。

对于 PatronusLocalEvaluatorTool，需要相同的参数，但评估器和黄金答案在初始化期间指定。

结论

Patronus 评估工具提供了一种强大的方法，可以使用 Patronus AI 平台评估和评分模型输入和输出。通过使代理能够评估其自身的输出或其他代理的输出，这些工具可以帮助提高 CrewAI 工作流程的质量和可靠性。

开始使用

指南

核心概念

MCP 集成

工具

可观测性

学习

遥测

Patronus AI 评估

Patronus AI 评估

概述

主要特点

评估工具

安装

开始步骤

示例

使用 PatronusEvalTool

使用 PatronusPredefinedCriteriaEvalTool

使用 PatronusLocalEvaluatorTool

参数

PatronusEvalTool

PatronusPredefinedCriteriaEvalTool

PatronusLocalEvaluatorTool

用法

结论

开始使用

指南

核心概念

MCP 集成

工具

可观测性

学习

遥测

​Patronus AI 评估

​概述

​主要特点

​评估工具

​安装

​开始步骤

​示例

​使用 PatronusEvalTool

​使用 PatronusPredefinedCriteriaEvalTool

​使用 PatronusLocalEvaluatorTool

​参数

​PatronusEvalTool

​PatronusPredefinedCriteriaEvalTool

​PatronusLocalEvaluatorTool

​用法

​结论

Patronus AI 评估

概述

主要特点

评估工具

安装

开始步骤

示例

使用 PatronusEvalTool

使用 PatronusPredefinedCriteriaEvalTool

使用 PatronusLocalEvaluatorTool

参数

PatronusEvalTool

PatronusPredefinedCriteriaEvalTool

PatronusLocalEvaluatorTool

用法

结论