`Patronus 评估工具`

描述

Patronus 评估工具旨在使 CrewAI 智能体能够使用 Patronus AI 平台评估并评分模型输入和输出。这些工具提供了对评估过程不同级别的控制，从允许智能体选择最合适的评估器和标准，到使用预定义标准或自定义本地评估器。

Patronus 主要有三种评估工具

PatronusEvalTool: 允许智能体为评估任务选择最合适的评估器和标准。
PatronusPredefinedCriteriaEvalTool: 使用用户指定的预定义评估器和标准。
PatronusLocalEvaluatorTool: 使用用户定义的自定义函数评估器。

安装

要使用这些工具，您需要安装 Patronus 软件包

uv add patronus

您还需要将您的 Patronus API 密钥设置为环境变量

export PATRONUS_API_KEY="your_patronus_api_key"

开始步骤

为了有效使用 Patronus 评估工具，请遵循以下步骤

安装 Patronus: 使用上述命令安装 Patronus 软件包。
设置 API 密钥: 将您的 Patronus API 密钥设置为环境变量。
选择合适的工具: 根据您的需求选择合适的 Patronus 评估工具。
配置工具: 使用必要的参数配置工具。

示例

使用 PatronusEvalTool

以下示例演示了如何使用 PatronusEvalTool，它允许智能体选择最合适的评估器和标准

代码
from crewai import Agent, Task, Crew
from crewai_tools import PatronusEvalTool

# Initialize the tool
patronus_eval_tool = PatronusEvalTool()

# Define an agent that uses the tool
coding_agent = Agent(
    role="Coding Agent",
    goal="Generate high quality code and verify that the output is code",
    backstory="An experienced coder who can generate high quality python code.",
    tools=[patronus_eval_tool],
    verbose=True,
)

# Example task to generate and evaluate code
generate_code_task = Task(
    description="Create a simple program to generate the first N numbers in the Fibonacci sequence. Select the most appropriate evaluator and criteria for evaluating your output.",
    expected_output="Program that generates the first N numbers in the Fibonacci sequence.",
    agent=coding_agent,
)

# Create and run the crew
crew = Crew(agents=[coding_agent], tasks=[generate_code_task])
result = crew.kickoff()

使用 PatronusPredefinedCriteriaEvalTool

以下示例演示了如何使用 PatronusPredefinedCriteriaEvalTool，它使用预定义的评估器和标准

代码
from crewai import Agent, Task, Crew
from crewai_tools import PatronusPredefinedCriteriaEvalTool

# Initialize the tool with predefined criteria
patronus_eval_tool = PatronusPredefinedCriteriaEvalTool(
    evaluators=[{"evaluator": "judge", "criteria": "contains-code"}]
)

# Define an agent that uses the tool
coding_agent = Agent(
    role="Coding Agent",
    goal="Generate high quality code",
    backstory="An experienced coder who can generate high quality python code.",
    tools=[patronus_eval_tool],
    verbose=True,
)

# Example task to generate code
generate_code_task = Task(
    description="Create a simple program to generate the first N numbers in the Fibonacci sequence.",
    expected_output="Program that generates the first N numbers in the Fibonacci sequence.",
    agent=coding_agent,
)

# Create and run the crew
crew = Crew(agents=[coding_agent], tasks=[generate_code_task])
result = crew.kickoff()

使用 PatronusLocalEvaluatorTool

以下示例演示了如何使用 PatronusLocalEvaluatorTool，它使用自定义函数评估器

代码
from crewai import Agent, Task, Crew
from crewai_tools import PatronusLocalEvaluatorTool
from patronus import Client, EvaluationResult
import random

# Initialize the Patronus client
client = Client()

# Register a custom evaluator
@client.register_local_evaluator("random_evaluator")
def random_evaluator(**kwargs):
    score = random.random()
    return EvaluationResult(
        score_raw=score,
        pass_=score >= 0.5,
        explanation="example explanation",
    )

# Initialize the tool with the custom evaluator
patronus_eval_tool = PatronusLocalEvaluatorTool(
    patronus_client=client,
    evaluator="random_evaluator",
    evaluated_model_gold_answer="example label",
)

# Define an agent that uses the tool
coding_agent = Agent(
    role="Coding Agent",
    goal="Generate high quality code",
    backstory="An experienced coder who can generate high quality python code.",
    tools=[patronus_eval_tool],
    verbose=True,
)

# Example task to generate code
generate_code_task = Task(
    description="Create a simple program to generate the first N numbers in the Fibonacci sequence.",
    expected_output="Program that generates the first N numbers in the Fibonacci sequence.",
    agent=coding_agent,
)

# Create and run the crew
crew = Crew(agents=[coding_agent], tasks=[generate_code_task])
result = crew.kickoff()

参数

PatronusEvalTool

PatronusEvalTool 在初始化时不需要任何参数。它会自动从 Patronus API 获取可用的评估器和标准。

PatronusPredefinedCriteriaEvalTool

PatronusPredefinedCriteriaEvalTool 在初始化时接受以下参数

evaluators: 必需。一个包含要使用的评估器和标准的字典列表。例如：[{"evaluator": "judge", "criteria": "contains-code"}]。

PatronusLocalEvaluatorTool

PatronusLocalEvaluatorTool 在初始化时接受以下参数

patronus_client: 必需。Patronus 客户端实例。
evaluator: 可选。要使用的注册本地评估器的名称。默认为空字符串。
evaluated_model_gold_answer: 可选。用于评估的标准答案（gold answer）。默认为空字符串。

用法

使用 Patronus 评估工具时，您需要提供模型输入、输出和上下文，工具将返回 Patronus API 的评估结果。

对于 PatronusEvalTool 和 PatronusPredefinedCriteriaEvalTool，调用工具时需要以下参数

evaluated_model_input: 智能体的任务描述纯文本。
evaluated_model_output: 智能体任务的输出。
evaluated_model_retrieved_context: 智能体的上下文。

对于 PatronusLocalEvaluatorTool，需要相同的参数，但评估器和标准答案在初始化时指定。

结论

Patronus 评估工具提供了一种强大的方式，可以使用 Patronus AI 平台评估和评分模型输入和输出。通过使智能体能够评估其自身的输出或其他智能体的输出，这些工具可以帮助提高 CrewAI 工作流的质量和可靠性。

NL2SQL 工具 PDF RAG 搜索

在此页面上

Patronus 评估工具
描述
安装
开始步骤
示例
使用 PatronusEvalTool
使用 PatronusPredefinedCriteriaEvalTool
使用 PatronusLocalEvaluatorTool
参数
PatronusEvalTool
PatronusPredefinedCriteriaEvalTool
PatronusLocalEvaluatorTool
用法
结论

开始

指南

核心概念

工具

智能体监控与可观测性

学习

遥测

Patronus 评估工具

`Patronus 评估工具`

描述

安装

开始步骤

示例

使用 PatronusEvalTool

使用 PatronusPredefinedCriteriaEvalTool

使用 PatronusLocalEvaluatorTool

参数

PatronusEvalTool

PatronusPredefinedCriteriaEvalTool

PatronusLocalEvaluatorTool

用法

结论

开始

指南

核心概念

工具

智能体监控与可观测性

学习

遥测

​Patronus 评估工具

​描述

​安装

​开始步骤

​示例

​使用 PatronusEvalTool

​使用 PatronusPredefinedCriteriaEvalTool

​使用 PatronusLocalEvaluatorTool

​参数

​PatronusEvalTool

​PatronusPredefinedCriteriaEvalTool

​PatronusLocalEvaluatorTool

​用法

​结论

`Patronus 评估工具`

描述

安装

开始步骤

示例

使用 PatronusEvalTool

使用 PatronusPredefinedCriteriaEvalTool

使用 PatronusLocalEvaluatorTool

参数

PatronusEvalTool

PatronusPredefinedCriteriaEvalTool

PatronusLocalEvaluatorTool

用法

结论