Releases: GoogleCloudPlatform/dfcx-scrapi
v1.13.1
v1.13.0
Breaking Changes
Note that 2 methods from the Sessions
class have been deprecated.
Sessions.preset_parameters
Sessions.run_conversation
For each of these methods, you can use Sessions.detect_intent
instead, which is fully backwards compatible.
New Features
Agent Tasks Generator
This is a specialized tool that allows a user to evaluate any arbitrary agent to determine what the Agent is capable of accomplishing from a task perspective.
This is a pre-release feature that will be accompanied by more automated testing features in the future.
For now, you can use this as a way to analyze arbitrary Agents to see if they are set up to perform the tasks you believe you configured them to do.
from dfcx_scrapi.tools.agent_task_generator import AgentTaskGenerator
atg = AgentTaskGenerator(agent_id=agent_id)
atg.get_agent_tasks()
Output
{'tasks': [{'name': 'Greeting and Intent Understanding',
'description': 'The agent greets the user and attempts to understand their intent. It can provide basic information like translations or virtual money, and direct the user to appropriate tools or flows based on their request.'},
{'name': 'Product and Company Information Retrieval',
'description': 'The agent can access a data store containing information from the YETI website to answer user queries about YETI products and the company.'},
{'name': 'Trip Planning Assistance',
'description': 'The agent can collect basic information from the user to assist with trip planning, including destination, travel dates, and preferences. It can then pass this information to a separate flow for further processing.'}]}
Evaluation Dataset from Conversation History
You can now quickly create an Evaluation dataset format using pre-selected conversations from the Conversation History in your Agent.
Simply select the list of conversation_ids
that you want, and pass that to the Evals.create_dataset_from_conv_ids
method, which will provide a Pandas Dataframe in return.
This can be saved as a CSV, Google Sheet, or used locally to run Evals on your Agent.
from dfcx_scrapi.core.conversation_history import ConversationHistory
from dfcx_scrapi.tools.evaluations import Evaluations
ch = ConversationHistory()
evals = Evaluations(agent_id=agent_id)
all_convos = ch.list_conversations(agent_id)
convo_ids = [convo.name for convo in all_convos[:5]]
evals.create_dataset_from_conv_ids(convo_ids)
Output
eval_id | action_id | action_type | action_input | action_input_parameters | tool_action | notes |
---|---|---|---|---|---|---|
0 | 001 | 1 | User Utterance | what items do you have for dogs? | ||
1 | 001 | 2 | Tool Invocation | yeti-website | {'requestBody': {'query': 'what items do you h... | yeti-website |
2 | 001 | 3 | Agent Response | YETI offers dog bowls and dog beds. The Boomer... | ||
3 | 002 | 1 | User Utterance | who is the ceo? | ||
4 | 002 | 2 | Tool Invocation | yeti-website | {'requestBody': {'query': 'who is the ceo?'}} | yeti-website |
5 | 002 | 3 | Agent Response | The CEO of YETI is Matt Reintjes. | ||
6 | 003 | 1 | User Utterance | I want to speak to an operator | ||
7 | 003 | 2 | Agent Response | Just a moment while I connect you... | ||
8 | 004 | 1 | User Utterance | where is yeti hq at? | ||
9 | 004 | 2 | Tool Invocation | yeti-website | {'requestBody': {'query': 'where is yeti hq at... | yeti-website |
10 | 004 | 3 | Agent Response | YETI's headquarters is located in Austin, Texa... | ||
11 | 005 | 1 | User Utterance | what is the smallest cup I can buy? | ||
12 | 005 | 2 | Tool Invocation | yeti-website | {'requestBody': {'query': 'what is the smalles... | yeti-website |
13 | 005 | 3 | Agent Response | The smallest cup you can buy is the 4oz cup. I... |
CICD Workflow Example
We've added an example CICD workflow for anyone that is curious to see how a CICD workflow could be set up using SCRAPI.
Fair warning, it's very involved! 😄
However, it provides some good pointers on how you can set up these types of complex pipelines using this library.
Enhancements
- Added support for
environment_id
when callingSessions.build_session_id
, which allows you to now useSessions.detect_intent
with asession_id
that includes an Environment - Added
language_code
support throughout Evaluations class - Added support for setting BigQuery logging and interaction settings
- Added support for new lint rules in
ruff
- Updated some out of date Example notebooks and fixed broken links
- Added support for Session Parameters in Evaluations
Bug Fix
- Fixed several issues in Evaluations where dataframe prep and parsing where failing
What's Changed
- Feat/agent tasks generator by @kmaphoenix in #259
- feat: Implement Python requests-based get/set for BigQuery Interaction Logging Settings by @justin-oos in #258
- Feat/bq update sdk by @kmaphoenix in #263
- Fill Golden Template from Conv_Ids by @gmchueh in #261
- Fix/update lint rules by @kmaphoenix in #264
- Clean example notebooks (bot builder / vertex ai / google sheets) by @ethanknights in #256
- Adding session parameters to evaluations function by @AAMEHROTRA1230 in #262
- Feature/dfcxcicd by @sridharvikram in #239
- fix: add lang_code support for DataLoader by @kmaphoenix in #265
- fix/evaluations multiple tool pairing and empty utterance pairing by @gmchueh in #267
- Update google doc reference in nlu_evaluation_testing.ipynb by @ethanknights in #266
- Fix/session id with env by @kmaphoenix in #271
New Contributors
- @justin-oos made their first contribution in #258
- @gmchueh made their first contribution in #261
- @ethanknights made their first contribution in #256
- @AAMEHROTRA1230 made their first contribution in #262
- @sridharvikram made their first contribution in #239
Full Changelog: 1.12.5...1.13.0
v1.12.5
v1.12.4
v1.12.3
What's Changed
- Feature/conversation rebase by @kmaphoenix in #240
- Fix/default creds inheritance by @kmaphoenix in #246
- Prevent IndexError in collect_playbook_responses when not in playbook by @SeanScripts in #244
- feat: add support for flow invoke; clean up creds passing in evals by @kmaphoenix in #248
- Fix/optional tool call metrics evals by @kmaphoenix in #250
- Fix/support lang code conversation by @kmaphoenix in #251
Full Changelog: 1.12.2...1.12.3
v1.12.2
Enhancements
- Added support for
language_code
on all applicable methods in Flows class - Added support for
parameters
when using the Datastore Evaluations class and notebook - Added support for Playbook Versions
- New notebook to check status of datastores and search for Datastore IDs, Doc IDs, and URLs
- Added helper methods for Search to make listing urls / doc ids / documents much easier users
Bug Fix
- Fixed bug in CopyUtil class that was causing the
create_entity_type
method to fail - Fixed a bug in Dataframe Functions which was causing scopes to not be inherited properly
- Fixed new Vertex Agents Evals notebook links for Github and GCP workbench launching to point to correct location
What's Changed
- fix: add support for language_code on applicable methods by @kmaphoenix in #222
- fix: update copy_util to resolve bug issue 192 by @my3sons in #205
- Feat/parameter support datastore evals by @kmaphoenix in #225
- feat: add support for playbook versions by @kmaphoenix in #226
- Fix/scopes dataframe functions by @kmaphoenix in #228
- Update vertex_agents_evals.ipynb by @YuncongZhou in #231
- Feature/datastoreindexurls by @agutta in #235
- Feat/add vais search methods by @kmaphoenix in #237
- chore: update notebook to use latest scrapi code by @kmaphoenix in #238
New Contributors
- @my3sons made their first contribution in #205
- @YuncongZhou made their first contribution in #231
- @agutta made their first contribution in #235
Full Changelog: 1.12.1...1.12.2
v1.12.1
Bug
- Patch to require
google-cloud-aiplatform
as part of the setuptools - The lack of
google-cloud-aiplatform
in setuptools was causing import errors in some classes that rely onvertexai
as an import
Full Changelog: 1.12.0...1.12.1
v1.12.0
New Features
Evaluations are here! 🎉
What are Evaluations? 📐 📈
We know that building an Agent is only part of the journey.
Understanding how that Agent responds to real-world queries is a key indicator of how it will perform in Production.
Running evaluations, or "evals", allows Agent developers to quickly identify "losses", or areas of opportunities for improving Agent design.
Evals can provide answers to questions like:
- What is the current performance baseline for my Agent?
- How is my Agent performing after the most recent changes?
- If I switch to a new LLM, how does that change my Agent's performance?
Evaluation Toolsets in SCRAPI 🛠️🐍
For this latest release, we have included 2 specific Eval setups for developers to use with Agent Builder and Dialogflow CX Agents.
These are offered as two distinct evaluations toolsets because of a few reasons:
- They support different build architectures in DFCX vs. Agent Builder
- They support different metrics based on the task you are trying to evaluate
- They support different tool calling setups: Native DataStores vs. arbitrary custom tools
Metrics by Toolset. 📏
The following metrics are currently supported for each toolset.
Additional metrics will be added over time to support various other evaluation needs.
- DataStore Evaluations
Url Match
Context Recall
Faithfulness
Answer Correctness
RougeL
- Multi-Turn, Multi-Agent w/ Tool Callling Evaluations
Semantic Similarity
Exact Match Tool Quality
Getting Started with Evaluations 🏁
- Start by choosing your Eval toolset based on the Agent architecture you are evaluating
- Build an Evaluation Dataset. You can find detailed information about the dataset formats in each of the toolset instructions
- Run your evals!
Example Eval Setup for Multi-Turn, Mutli-Agent w/ Tools
import pandas as pd
from dfcx_scrapi.tools.evaluations import Evaluations
from dfcx_scrapi.tools.evaluations import DataLoader
data = DataLoader()
INPUT_SCHEMA_REQUIRED_COLUMNS = ['eval_id', 'action_id', 'action_type', 'action_input', 'action_input_parameters', 'tool_action', 'notes']
sample_df = pd.DataFrame(columns=INPUT_SCHEMA_REQUIRED_COLUMNS)
sample_df.loc[0] = ["travel-ai-001", 1, "User Utterance", "Paris", "", "", ""]
sample_df.loc[1] = ["travel-ai-001", 2, "Playbook Invocation", "Travel Inspiration", "", "", ""]
sample_df.loc[2] = ["travel-ai-001", 3, "Agent Response", "Paris is a beautiful city! Here are a few things you might enjoy doing there:\n\nVisit the Eiffel Tower\nTake a walk along the Champs-Élysées\nVisit the Louvre Museum\nSee the Arc de Triomphe\nTake a boat ride on the Seine River", "", "", ""]
sample_df = data.from_dataframe(sample_df)
agent_id = "projects/your-project/locations/us-central1/agents/11111-2222-33333-44444" # Example Agent
evals = Evaluations(agent_id, metrics=["response_similarity", "tool_call_quality"])
eval_results = evals.run_query_and_eval(sample_df.head(10))
What's Changed
- Feat/evaluations by @kmaphoenix in #217
- Feat/evals notebook by @kmaphoenix in #218
Full Changelog: 1.11.2...1.12.0
v1.11.2
What's Changed
- fix: improve markdown line handling by @kmaphoenix in #213
- Add build_search_engine_proto() to Engines by @rantman in #215
New Contributors
Full Changelog: 1.11.1...1.11.2
v1.11.1
What's Changed
- FR - allow users to pass endUserMetadata as an optional in detect_intent and autoeval colab by @jkshj21 in #210
- FR-186 - Export results into multiple mode by @Naveenkm13 in #209
- Add creds to the constructors by @MRyderOC in #204
- Feat/playbook instructions parsing by @kmaphoenix in #212
New Contributors
- @Naveenkm13 made their first contribution in #209
Full Changelog: 1.11.0...1.11.1