-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Evaluate agents with GenAI Model Eval #1555
feat: Evaluate agents with GenAI Model Eval #1555
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @inardini, Gemini is currently reviewing this pull request. In the meantime, here's a summary for you and other reviewers to quickly grasp the changes and their purpose:
This pull request aims to evaluate agents using the Vertex AI GenAI Model Eval. The primary changes involve the addition of three new Jupyter notebooks:
gemini/evaluation/evaluating_crewai_agent.ipynb
: This notebook demonstrates how to evaluate a CrewAI agent using Vertex AI Gen AI Evaluation. It covers building a local agent, preparing an evaluation dataset, and performing evaluations for single tool usage, trajectory, and response generation. The notebook is quite comprehensive, covering various aspects of agent evaluation.gemini/evaluation/evaluating_langgraph_agent.ipynb
: Similar to the CrewAI notebook, this notebook focuses on evaluating a LangGraph agent using the same Vertex AI service. The structure and content are parallel to the CrewAI notebook, adapting the evaluation process for the LangGraph framework.gemini/reasoning-engine/evaluating_crewai_agent_reasoning_engine_customized_template.ipynb
: This notebook shows how to evaluate a CrewAI agent deployed on Vertex AI Reasoning Engine using a customized template. It includes steps for building, deploying, and evaluating the agent, mirroring the evaluation techniques from the previous notebooks but within the Reasoning Engine context. This notebook also demonstrates the use of a custom evaluation metric.
All three notebooks provide detailed instructions, code examples, and visualizations to aid in understanding and replicating the evaluation process. They appear well-structured and comprehensive.
Here's a little haiku to lighten the mood:
Code flows like a stream,
Tests run, a gentle cascade,
Bugs vanish in peace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pull request introduces notebooks demonstrating agent evaluation using the Vertex AI Gen AI Evaluation service. The notebooks cover various agent frameworks (CrewAI, LangGraph) and deployment scenarios (local, Reasoning Engine). The structure is generally clear, with good explanations of the evaluation process and metrics. The use of helper functions for visualization and data handling is a plus, making the notebooks more user-friendly. The inclusion of Bring Your Own Dataset (BYOD) examples adds valuable flexibility.
However, there are areas for improvement, particularly in code consistency, error handling, and adherence to best practices. Some sections could benefit from more detailed explanations or examples, especially when introducing new concepts or tools. Additionally, there are opportunities to improve code efficiency and robustness. Addressing these points will enhance the overall quality and educational value of the notebooks.
Description
This PR is about the new Gen AI Evaluation for agent evaluation.
Thank you for opening a Pull Request!
Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
CONTRIBUTING
Guide.CODEOWNERS
for the file(s).nox -s format
from the repository root to format).