Skip to content

Releases: GoogleCloudPlatform/dfcx-scrapi

v1.13.1

17 Dec 03:30
Compare
Choose a tag to compare

Bug Fix

  • Fix type in environment client usage in Sessions class

Full Changelog: 1.13.0...1.13.1

v1.13.0

17 Dec 03:17
Compare
Choose a tag to compare

Breaking Changes

Note that 2 methods from the Sessions class have been deprecated.

  • Sessions.preset_parameters
  • Sessions.run_conversation

For each of these methods, you can use Sessions.detect_intent instead, which is fully backwards compatible.

New Features

Agent Tasks Generator

This is a specialized tool that allows a user to evaluate any arbitrary agent to determine what the Agent is capable of accomplishing from a task perspective.
This is a pre-release feature that will be accompanied by more automated testing features in the future.

For now, you can use this as a way to analyze arbitrary Agents to see if they are set up to perform the tasks you believe you configured them to do.

from dfcx_scrapi.tools.agent_task_generator import AgentTaskGenerator

atg = AgentTaskGenerator(agent_id=agent_id)
atg.get_agent_tasks()

Output

{'tasks': [{'name': 'Greeting and Intent Understanding',
   'description': 'The agent greets the user and attempts to understand their intent. It can provide basic information like translations or virtual money, and direct the user to appropriate tools or flows based on their request.'},
  {'name': 'Product and Company Information Retrieval',
   'description': 'The agent can access a data store containing information from the YETI website to answer user queries about YETI products and the company.'},
  {'name': 'Trip Planning Assistance',
   'description': 'The agent can collect basic information from the user to assist with trip planning, including destination, travel dates, and preferences. It can then pass this information to a separate flow for further processing.'}]}

Evaluation Dataset from Conversation History

You can now quickly create an Evaluation dataset format using pre-selected conversations from the Conversation History in your Agent.
Simply select the list of conversation_ids that you want, and pass that to the Evals.create_dataset_from_conv_ids method, which will provide a Pandas Dataframe in return.

This can be saved as a CSV, Google Sheet, or used locally to run Evals on your Agent.

from dfcx_scrapi.core.conversation_history import ConversationHistory
from dfcx_scrapi.tools.evaluations import Evaluations

ch = ConversationHistory()
evals = Evaluations(agent_id=agent_id)

all_convos = ch.list_conversations(agent_id)
convo_ids = [convo.name for convo in all_convos[:5]]
evals.create_dataset_from_conv_ids(convo_ids)

Output

eval_id action_id action_type action_input action_input_parameters tool_action notes
0 001 1 User Utterance what items do you have for dogs?
1 001 2 Tool Invocation yeti-website {'requestBody': {'query': 'what items do you h... yeti-website
2 001 3 Agent Response YETI offers dog bowls and dog beds. The Boomer...
3 002 1 User Utterance who is the ceo?
4 002 2 Tool Invocation yeti-website {'requestBody': {'query': 'who is the ceo?'}} yeti-website
5 002 3 Agent Response The CEO of YETI is Matt Reintjes.
6 003 1 User Utterance I want to speak to an operator
7 003 2 Agent Response Just a moment while I connect you...
8 004 1 User Utterance where is yeti hq at?
9 004 2 Tool Invocation yeti-website {'requestBody': {'query': 'where is yeti hq at... yeti-website
10 004 3 Agent Response YETI's headquarters is located in Austin, Texa...
11 005 1 User Utterance what is the smallest cup I can buy?
12 005 2 Tool Invocation yeti-website {'requestBody': {'query': 'what is the smalles... yeti-website
13 005 3 Agent Response The smallest cup you can buy is the 4oz cup. I...

CICD Workflow Example

We've added an example CICD workflow for anyone that is curious to see how a CICD workflow could be set up using SCRAPI.
Fair warning, it's very involved! 😄
However, it provides some good pointers on how you can set up these types of complex pipelines using this library.

Enhancements

  • Added support for environment_id when calling Sessions.build_session_id, which allows you to now use Sessions.detect_intent with a session_id that includes an Environment
  • Added language_code support throughout Evaluations class
  • Added support for setting BigQuery logging and interaction settings
  • Added support for new lint rules in ruff
  • Updated some out of date Example notebooks and fixed broken links
  • Added support for Session Parameters in Evaluations

Bug Fix

  • Fixed several issues in Evaluations where dataframe prep and parsing where failing

What's Changed

New Contributors

Full Changelog: 1.12.5...1.13.0

v1.12.5

29 Oct 16:58
Compare
Choose a tag to compare

What's Changed

Full Changelog: 1.12.4...1.12.5

v1.12.4

10 Oct 03:50
Compare
Choose a tag to compare

What's Changed

Full Changelog: 1.12.3...1.12.4

v1.12.3

09 Oct 22:06
Compare
Choose a tag to compare

What's Changed

Full Changelog: 1.12.2...1.12.3

v1.12.2

27 Sep 19:56
Compare
Choose a tag to compare

Enhancements

  • Added support for language_code on all applicable methods in Flows class
  • Added support for parameters when using the Datastore Evaluations class and notebook
  • Added support for Playbook Versions
  • New notebook to check status of datastores and search for Datastore IDs, Doc IDs, and URLs
  • Added helper methods for Search to make listing urls / doc ids / documents much easier users

Bug Fix

  • Fixed bug in CopyUtil class that was causing the create_entity_type method to fail
  • Fixed a bug in Dataframe Functions which was causing scopes to not be inherited properly
  • Fixed new Vertex Agents Evals notebook links for Github and GCP workbench launching to point to correct location

What's Changed

New Contributors

Full Changelog: 1.12.1...1.12.2

v1.12.1

23 Aug 11:21
Compare
Choose a tag to compare

Bug

  • Patch to require google-cloud-aiplatform as part of the setuptools
  • The lack of google-cloud-aiplatform in setuptools was causing import errors in some classes that rely on vertexai as an import

Full Changelog: 1.12.0...1.12.1

v1.12.0

21 Aug 15:05
Compare
Choose a tag to compare

New Features

Evaluations are here! 🎉

What are Evaluations? 📐 📈

We know that building an Agent is only part of the journey.
Understanding how that Agent responds to real-world queries is a key indicator of how it will perform in Production.
Running evaluations, or "evals", allows Agent developers to quickly identify "losses", or areas of opportunities for improving Agent design.

Evals can provide answers to questions like:

  • What is the current performance baseline for my Agent?
  • How is my Agent performing after the most recent changes?
  • If I switch to a new LLM, how does that change my Agent's performance?

Evaluation Toolsets in SCRAPI 🛠️🐍

For this latest release, we have included 2 specific Eval setups for developers to use with Agent Builder and Dialogflow CX Agents.

  1. DataStore Evaluations
  2. Multi-turn, Multi-Agent w/ Tool Calling Evaluations

These are offered as two distinct evaluations toolsets because of a few reasons:

  • They support different build architectures in DFCX vs. Agent Builder
  • They support different metrics based on the task you are trying to evaluate
  • They support different tool calling setups: Native DataStores vs. arbitrary custom tools

Metrics by Toolset. 📏

The following metrics are currently supported for each toolset.
Additional metrics will be added over time to support various other evaluation needs.

  • DataStore Evaluations
    • Url Match
    • Context Recall
    • Faithfulness
    • Answer Correctness
    • RougeL
  • Multi-Turn, Multi-Agent w/ Tool Callling Evaluations
    • Semantic Similarity
    • Exact Match Tool Quality

Getting Started with Evaluations 🏁

  1. Start by choosing your Eval toolset based on the Agent architecture you are evaluating
  2. Build an Evaluation Dataset. You can find detailed information about the dataset formats in each of the toolset instructions
  3. Run your evals!

Example Eval Setup for Multi-Turn, Mutli-Agent w/ Tools

import pandas as pd
from dfcx_scrapi.tools.evaluations import Evaluations
from dfcx_scrapi.tools.evaluations import DataLoader

data = DataLoader()

INPUT_SCHEMA_REQUIRED_COLUMNS = ['eval_id', 'action_id', 'action_type', 'action_input', 'action_input_parameters', 'tool_action', 'notes']

sample_df = pd.DataFrame(columns=INPUT_SCHEMA_REQUIRED_COLUMNS)

sample_df.loc[0] = ["travel-ai-001", 1, "User Utterance", "Paris", "", "", ""]
sample_df.loc[1] = ["travel-ai-001", 2, "Playbook Invocation", "Travel Inspiration", "", "", ""]
sample_df.loc[2] = ["travel-ai-001", 3, "Agent Response", "Paris is a beautiful city! Here are a few things you might enjoy doing there:\n\nVisit the Eiffel Tower\nTake a walk along the Champs-Élysées\nVisit the Louvre Museum\nSee the Arc de Triomphe\nTake a boat ride on the Seine River", "", "", ""]

sample_df = data.from_dataframe(sample_df)
agent_id = "projects/your-project/locations/us-central1/agents/11111-2222-33333-44444" # Example Agent
evals = Evaluations(agent_id, metrics=["response_similarity", "tool_call_quality"])
eval_results = evals.run_query_and_eval(sample_df.head(10))

What's Changed

Full Changelog: 1.11.2...1.12.0

v1.11.2

09 Aug 16:45
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: 1.11.1...1.11.2

v1.11.1

31 Jul 18:45
Compare
Choose a tag to compare

What's Changed

  • FR - allow users to pass endUserMetadata as an optional in detect_intent and autoeval colab by @jkshj21 in #210
  • FR-186 - Export results into multiple mode by @Naveenkm13 in #209
  • Add creds to the constructors by @MRyderOC in #204
  • Feat/playbook instructions parsing by @kmaphoenix in #212

New Contributors

Full Changelog: 1.11.0...1.11.1