feat: Evaluate agents with GenAI Model Eval #1555

inardini · 2024-12-18T10:03:45Z

Description

This PR is about the new Gen AI Evaluation for agent evaluation.

Thank you for opening a Pull Request!
Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Follow the CONTRIBUTING Guide.
You are listed as the author in your notebook or README file.
- Your account is listed in CODEOWNERS for the file(s).
Make your Pull Request title in the https://www.conventionalcommits.org/ specification.
Ensure the tests and linter pass (Run nox -s format from the repository root to format).
Appropriate docs were updated (if necessary)

review-notebook-app · 2024-12-18T10:03:52Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

code-review-assist

Hi @inardini, Gemini is currently reviewing this pull request. In the meantime, here's a summary for you and other reviewers to quickly grasp the changes and their purpose:

This pull request aims to evaluate agents using the Vertex AI GenAI Model Eval. The primary changes involve the addition of three new Jupyter notebooks:

gemini/evaluation/evaluating_crewai_agent.ipynb: This notebook demonstrates how to evaluate a CrewAI agent using Vertex AI Gen AI Evaluation. It covers building a local agent, preparing an evaluation dataset, and performing evaluations for single tool usage, trajectory, and response generation. The notebook is quite comprehensive, covering various aspects of agent evaluation.
gemini/evaluation/evaluating_langgraph_agent.ipynb: Similar to the CrewAI notebook, this notebook focuses on evaluating a LangGraph agent using the same Vertex AI service. The structure and content are parallel to the CrewAI notebook, adapting the evaluation process for the LangGraph framework.
gemini/reasoning-engine/evaluating_crewai_agent_reasoning_engine_customized_template.ipynb: This notebook shows how to evaluate a CrewAI agent deployed on Vertex AI Reasoning Engine using a customized template. It includes steps for building, deploying, and evaluating the agent, mirroring the evaluation techniques from the previous notebooks but within the Reasoning Engine context. This notebook also demonstrates the use of a custom evaluation metric.

All three notebooks provide detailed instructions, code examples, and visualizations to aid in understanding and replicating the evaluation process. They appear well-structured and comprehensive.

Here's a little haiku to lighten the mood:

Code flows like a stream,
Tests run, a gentle cascade,
Bugs vanish in peace.

code-review-assist

This pull request introduces notebooks demonstrating agent evaluation using the Vertex AI Gen AI Evaluation service. The notebooks cover various agent frameworks (CrewAI, LangGraph) and deployment scenarios (local, Reasoning Engine). The structure is generally clear, with good explanations of the evaluation process and metrics. The use of helper functions for visualization and data handling is a plus, making the notebooks more user-friendly. The inclusion of Bring Your Own Dataset (BYOD) examples adds valuable flexibility.

However, there are areas for improvement, particularly in code consistency, error handling, and adherence to best practices. Some sections could benefit from more detailed explanations or examples, especially when introducing new concepts or tools. Additionally, there are opportunities to improve code efficiency and robustness. Addressing these points will enhance the overall quality and educational value of the notebooks.

gemini/evaluation/evaluating_crewai_agent.ipynb

gemini/evaluation/evaluating_langgraph_agent.ipynb

gemini/reasoning-engine/evaluating_crewai_agent_reasoning_engine_customized_template.ipynb

gemini/reasoning-engine/evaluating_langchain_agent_reasoning_engine_prebuilt_template.ipynb

inardini added 3 commits December 18, 2024 08:59

evaluate agents notebooks

56fbaa1

update links

a380a46

nox passed

239ed67

inardini requested a review from a team as a code owner December 18, 2024 10:03

code-review-assist bot reviewed Dec 18, 2024

View reviewed changes

inardini changed the title ~~evaluate agents with GenAI Model Eval~~ feat: Evaluate agents with GenAI Model Eval Dec 18, 2024

code-review-assist bot reviewed Dec 18, 2024

View reviewed changes

inardini and others added 11 commits December 18, 2024 10:11

clean output

25045b9

remove bucket name

f56ab9c

fix issue

eeb11f8

add text

e02e3e2

fix typos

03ba130

fix typos

18ab84d

fix typo

9cdb97a

fix name typo

57f92b7

Formatting

9b62747

Merge branch 'main' into evaluate-agents

f08c9b5

Fix spelling issue runnning

487a48e

holtskinner approved these changes Dec 18, 2024

View reviewed changes

Fix spelling

4ac964a

holtskinner merged commit 08722bf into GoogleCloudPlatform:main Dec 18, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Evaluate agents with GenAI Model Eval #1555

feat: Evaluate agents with GenAI Model Eval #1555

inardini commented Dec 18, 2024 •

edited

Loading

review-notebook-app bot commented Dec 18, 2024

code-review-assist bot left a comment

code-review-assist bot left a comment

feat: Evaluate agents with GenAI Model Eval #1555

feat: Evaluate agents with GenAI Model Eval #1555

Conversation

inardini commented Dec 18, 2024 • edited Loading

Description

review-notebook-app bot commented Dec 18, 2024

code-review-assist bot left a comment

Choose a reason for hiding this comment

code-review-assist bot left a comment

Choose a reason for hiding this comment

inardini commented Dec 18, 2024 •

edited

Loading