From 3e6644cc713e352fe3d0df51fb54f1b1eb649701 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Tue, 17 Sep 2024 12:19:16 -0500 Subject: [PATCH 01/10] chore(deps): bump the pip group across 1 directory with 4 updates (#1108) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Updates the requirements on [aiohttp](https://github.com/aio-libs/aiohttp), [cryptography](https://github.com/pyca/cryptography), [nltk](https://github.com/nltk/nltk) and [streamlit](https://github.com/streamlit/streamlit) to permit the latest version. Updates `aiohttp` to 3.10.5
Release notes

Sourced from aiohttp's releases.

3.10.5

Bug fixes

Miscellaneous internal changes


Changelog

Sourced from aiohttp's changelog.

3.10.5 (2024-08-19)

Bug fixes

Miscellaneous internal changes


3.10.4 (2024-08-17)

Bug fixes

... (truncated)

Commits

Updates `cryptography` to 43.0.1
Changelog

Sourced from cryptography's changelog.

43.0.1 - 2024-09-03


* Updated Windows, macOS, and Linux wheels to be compiled with OpenSSL
3.3.2.

.. _v43-0-0:

43.0.0 - 2024-07-20

... (truncated)

Commits

Updates `nltk` to 3.9.1
Changelog

Sourced from nltk's changelog.

Version 3.9.1 2024-08-19

Version 3.9 2024-08-18

Thanks to the following contributors to 3.8.2: Tom Aarsen, Cat Lee Ball, Veralara Bernhard, Carlos Brandt, Konstantin Chernyshev, Michael Higgins, Eric Kafe, Vivek Kalyan, David Lukes, Rob Malouf, purificant, Alex Rudnick, Liling Tan, Akihiro Yamazaki.

Version 3.8.1 2023-01-02

Thanks to the following contributors to 3.8.1: Francis Bond, John Vandenberg, Tom Aarsen

Version 3.8 2022-12-12

... (truncated)

Commits

Updates `streamlit` to 1.38.0
Release notes

Sourced from streamlit's releases.

1.38.0

What's Changed

Breaking Changes 🛠

New Features 🎉

Bug Fixes 🐛

Other Changes

... (truncated)

Commits

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) ---
Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore major version` will close this group update PR and stop Dependabot creating any more for the specific dependency's major version (unless you unignore this specific dependency's major version or upgrade to it yourself) - `@dependabot ignore minor version` will close this group update PR and stop Dependabot creating any more for the specific dependency's minor version (unless you unignore this specific dependency's minor version or upgrade to it yourself) - `@dependabot ignore ` will close this group update PR and stop Dependabot creating any more for the specific dependency (unless you unignore this specific dependency or upgrade to it yourself) - `@dependabot unignore ` will remove all of the ignore conditions of the specified dependency - `@dependabot unignore ` will remove the ignore condition of the specified dependency and ignore conditions You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/GoogleCloudPlatform/generative-ai/network/alerts).
Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Holt Skinner <13262395+holtskinner@users.noreply.github.com> --- gemini/sample-apps/llamaindex-rag/pyproject.toml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/gemini/sample-apps/llamaindex-rag/pyproject.toml b/gemini/sample-apps/llamaindex-rag/pyproject.toml index ee988d25bf3..317ee5d197f 100644 --- a/gemini/sample-apps/llamaindex-rag/pyproject.toml +++ b/gemini/sample-apps/llamaindex-rag/pyproject.toml @@ -17,7 +17,7 @@ package-mode = false python = "^3.10" pyyaml = "6.0.1" llama-index = "0.10.58" -aiohttp = "3.10.2" +aiohttp = "3.10.5" aiosignal = "1.3.1" altair = "5.3.0" annotated-types = "0.7.0" @@ -154,7 +154,7 @@ multiprocess = "0.70.16" mypy-extensions = "1.0.0" nest-asyncio = "1.6.0" networkx = "3.3" -nltk = "3.9" +nltk = "3.9.1" numpy = "1.26.4" oauthlib = "3.2.2" omegaconf = "2.3.0" @@ -228,7 +228,7 @@ sqlalchemy = "2.0.31" st-annotated-text = "4.0.1" st-theme = "1.2.3" starlette = "0.37.2" -streamlit = "1.37.0" +streamlit = "1.38.0" streamlit-camera-input-live = "0.2.0" streamlit-card = "1.0.2" streamlit-embedcode = "0.1.2" From 8fffcd789ec1e7ba1257b98906974faece36d142 Mon Sep 17 00:00:00 2001 From: Nishant Nayak Date: Tue, 17 Sep 2024 15:46:46 -0400 Subject: [PATCH 02/10] chore: update 404 link in vector-search-quickstart.ipynb (#1113) --- embeddings/vector-search-quickstart.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/embeddings/vector-search-quickstart.ipynb b/embeddings/vector-search-quickstart.ipynb index 28073bb3db7..904c638ce72 100644 --- a/embeddings/vector-search-quickstart.ipynb +++ b/embeddings/vector-search-quickstart.ipynb @@ -326,7 +326,7 @@ "\n", "The text embeddings represent the meaning of the clothing product names. In this tutorial, we will use Vector Search for completing a [semantic search](https://en.wikipedia.org/wiki/Semantic_search) of the items. This sample code can be used as a basis for other simple recommendation system where you can quickly find \"other products similar to this one\".\n", "\n", - "To learn more about how to create the embeddings from the data on a BigQuery table and store them in a JSON file, see [Getting Started with Text Embeddings + Vertex AI Vector Search](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/vector-search/intro-textemb-vectorsearch.ipynb). " + "To learn more about how to create the embeddings from the data on a BigQuery table and store them in a JSON file, see [Getting Started with Text Embeddings + Vertex AI Vector Search](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/embeddings/intro-textemb-vectorsearch.ipynb). " ] }, { From 31c746be391bed9cda88ea7525d7783472c9f088 Mon Sep 17 00:00:00 2001 From: Mend Renovate Date: Wed, 18 Sep 2024 05:26:54 +0200 Subject: [PATCH 03/10] chore(deps): update dependency vite to v5.4.6 [security] (#1115) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This PR contains the following updates: | Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | [vite](https://vitejs.dev) ([source](https://redirect.github.com/vitejs/vite/tree/HEAD/packages/vite)) | [`5.4.3` -> `5.4.6`](https://renovatebot.com/diffs/npm/vite/5.4.3/5.4.6) | [![age](https://developer.mend.io/api/mc/badges/age/npm/vite/5.4.6?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://developer.mend.io/api/mc/badges/adoption/npm/vite/5.4.6?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://developer.mend.io/api/mc/badges/compatibility/npm/vite/5.4.3/5.4.6?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://developer.mend.io/api/mc/badges/confidence/npm/vite/5.4.3/5.4.6?slim=true)](https://docs.renovatebot.com/merge-confidence/) | --- > [!WARNING] > Some dependencies could not be looked up. Check the warning logs for more information. ### GitHub Vulnerability Alerts #### [CVE-2024-45811](https://redirect.github.com/vitejs/vite/security/advisories/GHSA-9cwx-2883-4wfx) ### Summary The contents of arbitrary files can be returned to the browser. ### Details `@fs` denies access to files outside of Vite serving allow list. Adding `?import&raw` to the URL bypasses this limitation and returns the file content if it exists. ### PoC ```sh $ npm create vite@latest $ cd vite-project/ $ npm install $ npm run dev $ echo "top secret content" > /tmp/secret.txt # expected behaviour $ curl "http://localhost:5173/@​fs/tmp/secret.txt"

403 Restricted

The request url "/tmp/secret.txt" is outside of Vite serving allow list. # security bypassed $ curl "http://localhost:5173/@​fs/tmp/secret.txt?import&raw" export default "top secret content\n" //# sourceMappingURL=data:application/json;base64,eyJ2... ``` --- ### Release Notes

vitejs/vite (vite) ### [`v5.4.6`](https://redirect.github.com/vitejs/vite/releases/tag/v5.4.6) [Compare Source](https://redirect.github.com/vitejs/vite/compare/v5.4.5...v5.4.6) Please refer to [CHANGELOG.md](https://redirect.github.com/vitejs/vite/blob/v5.4.6/packages/vite/CHANGELOG.md) for details. ### [`v5.4.5`](https://redirect.github.com/vitejs/vite/releases/tag/v5.4.5) [Compare Source](https://redirect.github.com/vitejs/vite/compare/v5.4.4...v5.4.5) Please refer to [CHANGELOG.md](https://redirect.github.com/vitejs/vite/blob/v5.4.5/packages/vite/CHANGELOG.md) for details. ### [`v5.4.4`](https://redirect.github.com/vitejs/vite/releases/tag/v5.4.4) [Compare Source](https://redirect.github.com/vitejs/vite/compare/v5.4.3...v5.4.4) Please refer to [CHANGELOG.md](https://redirect.github.com/vitejs/vite/blob/v5.4.4/packages/vite/CHANGELOG.md) for details.
--- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] If you want to rebase/retry this PR, check this box --- This PR was generated by [Mend Renovate](https://mend.io/renovate/). View the [repository job log](https://developer.mend.io/github/GoogleCloudPlatform/generative-ai). Co-authored-by: Kristopher Overholt --- conversation/chat-app/package-lock.json | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/conversation/chat-app/package-lock.json b/conversation/chat-app/package-lock.json index eb501b55311..c36f273ab0a 100644 --- a/conversation/chat-app/package-lock.json +++ b/conversation/chat-app/package-lock.json @@ -2776,9 +2776,9 @@ "dev": true }, "node_modules/vite": { - "version": "5.4.3", - "resolved": "https://registry.npmjs.org/vite/-/vite-5.4.3.tgz", - "integrity": "sha512-IH+nl64eq9lJjFqU+/yrRnrHPVTlgy42/+IzbOdaFDVlyLgI/wDlf+FCobXLX1cT0X5+7LMyH1mIy2xJdLfo8Q==", + "version": "5.4.6", + "resolved": "https://registry.npmjs.org/vite/-/vite-5.4.6.tgz", + "integrity": "sha512-IeL5f8OO5nylsgzd9tq4qD2QqI0k2CQLGrWD0rCN0EQJZpBK5vJAx0I+GDkMOXxQX/OfFHMuLIx6ddAxGX/k+Q==", "dev": true, "license": "MIT", "dependencies": { From 9fb19e71e493415f9ea9352913f7235001d38d82 Mon Sep 17 00:00:00 2001 From: Holt Skinner <13262395+holtskinner@users.noreply.github.com> Date: Wed, 18 Sep 2024 10:22:27 -0500 Subject: [PATCH 04/10] ci: Clear "expect.txt" for check-spelling (#1130) --- .github/actions/spelling/expect.txt | 914 ---------------------------- 1 file changed, 914 deletions(-) diff --git a/.github/actions/spelling/expect.txt b/.github/actions/spelling/expect.txt index 862dc503f24..e69de29bb2d 100644 --- a/.github/actions/spelling/expect.txt +++ b/.github/actions/spelling/expect.txt @@ -1,914 +0,0 @@ -aaae -aaf -aaxis -abcc -acb -accura -acf -Adedeji -Adidas -ADMA -advanc -agentic -AGG -AGs -ainsi -aip -aiplatform -akka -Akkaoui -Aktu -alcuna -allowi -alloydb -AlphaFold -Amarilli -analysisremote -Aniston -anonymization -antiword -APAC -apges -applehelp -appuser -Arborio -Arsan -artifactregistry -Artsakh -arxiv -asarray -ASF -assurent -astype -asyncmock -Atticus -automerge -autopush -autorater -autosummary -autosxs -autotuning -autres -autrui -backticks -baco -Baggins -Barclays -barmode -barpolar -baxis -bbc -bbf -bboxes -bcdfd -beginnen -beim -bella -bellow -belov'd -bfa -Biden -bigframes -bigquery -bigqueryconnection -bigquerystorage -bigserial -Bigtable -bioenergy -Bitcoin -bleiben -blogposts -blogs -bornes -Borregas -boulier -Boyz -bpd -bqdf -bqml -branchess -bucketname -Buckleys -Buffay -butta -Caldara -CALIPSO -carbonara -Carlessian -catus -caxis -ccbf -cccbd -cctemplate -cdad -cdc -cdk -ceb -cec -celle -certifi -ces -cet -Ceux -chaque -chatbots -chatbox -Chawla -CHECKOV -Cheeseman -Chicxulub -chipset -Chocolat -choosi -chromadb -Cinemark -ckpt -clearn -CLIs -cloudapis -cloudbuild -cloudfunction -cloudkms -cloudresourcemanager -cloudrun -cloudsql -cloudveil -cmap -codebases -codechat -codefile -codelab -CODEOWNERS -coderag -codey -colab -Collider -Colm -coloraxis -colorbar -colorway -colwidth -conching -concourir -consiste -consts -continute -Contly -contraint -cookin -cosa -coupable -coveragerc -cpet -crowdsourcing -crudele -csa -cse -CUAD -cultura -currentcolor -CVD -CVS -cygpath -d'adorarvi -d'une -dans -danza -darkgrid -dataform -dataframe -datapoints -dbln -dcfd -DCG -ddc -ddl -debian -deconflict -defb -Deleece -demeurent -demouser -dente -Depatmint -descgen -devhelp -devrel -DHH -Dialogflow -Diarization -dicesti -diese -diesen -Digala -directlt -direnv -discoveryengine -disperar -Disturbia -Dload -dlp -docai -DOCDB -docfx -dockerpush -docstore -doctrees -documentai -doivent -Donya -Dreesen -driv -dropna -DRV -dtype -Durafast -Durmus -DVDs -Dzor -ebbb -ecommerce -EDB -edfc -Edunov -effici -EHR -EIP -ekg -Elimende -elinks -elles -emb -embeddings -embvs -EMEA -EMFU -EMNLP -emplois -Emul -emulsif -encanta -endblock -endedness -endlocal -eneration -enterpriseknowledgegraph -enterprisesearch -envrc -epath -Epc -Eragon -erally -erlang -erty -erwinh -ESG -Esin -essen -etags -etf -etils -euch -EUR -evals -eventarc -exercer -expl -faiss -fanciulla -faqs -fastapi -fcc -fda -fewshot -FFL -fiero -figsize -fillmode -fillna -finall -firebaserc -firestore -Fishburne -fixmycar -fixmycarbackend -flac -Flipkart -flowbite -fmeasure -FMLA -foco -followups -Folmer -footwell -forno -freind -froma -fromarray -fromiter -FSD -fss -fulltext -fullurl -fullwidth -functiona -functiondef -furter -fuzzywuzzy -Gatace -gbq -gcf -gcloud -gcp -gcs -GDC -geht -genai -genappbuilder -Genomics -GenTwo -genwealth -geocoded -getconn -getexif -gidiyor -Giordani -Gisting -gitleaks -gitpython -gke -glusr -Godzilla -Gonggong -Googl -googleapiclient -googlecloud -gpg -gpt -gpu -gradio -gradlew -gramm -grammer -gridcolor -grocerybot -grpcio -gserviceaccount -gsm -gspread -gsutil -guanciale -gunicorn -hadolint -Hamers -hashicorp -hashtag -Haumea -hdlr -heatmap -heatmapgl -HEPASKY -hexsha -Hickson -hnsw -hoffe -Hogwarts -Holog -holtskinner -HOMEDRIVE -HOMEPATH -hommes -hovermode -htmlhelp -htmlhintrc -Hubmann -HZN -iban -idk -IFI -ifidx -iloc -ils -imagegeneration -imageno -imagesearch -imagetext -imdb -immagine -imshow -Inagella -inbox -indexvalue -individu -ingre -ingredie -inputtext -instru -instruc -Inte -intersphinx -invo -Iosif -ipykernel -ipynb -ipywidgets -IRAs -isq -italiana -ITDMs -iterrows -ivf -ivfflat -ixed -J'aime -javac -JAVACMD -Jax -JBEAP -jdk -Jedi -jegadesh -JHome -jiwer -jpa -jre -jsonify -jumpstart -junitxml -jupyter -jusqu -kaggle -Kalamang -Kamradt -kann -Keanu -Keown -keras -Keyb -Keychain -KFBI -kgs -KHTML -kickstart -Knative -KPIs -KSA -Kudrow -l'anglais -l'exercice -Ladhak -lakecolor -Lalit -landcolor -langchain -Lasst -lastrequest -lastresponse -LCEL -lego -Legrenzi -lengh -leur -Leute -lexer -ligh -linalg -linecolor -linestyle -linting -Liquicap -listdir -llm -logits -Logrus -loguru -loi -lolcat -lon -LOOKBACK -Lottry -LPH -lsb -LSum -lxml -Mager -magika -mai -Maillard -Makemake -mapbox -marb -maskmode -matchingengine -mavenrc -mbsdk -mdc -mec -mediterraneansea -membres -meme -Memorystore -Mercari -metadatas -metageneration -MFU -MICOA -millisecs -Mindf -miniforge -Minuetto -mio -mmr -Molaison -morire -moto -Mpa -mpe -mpld -mrag -mtu -multimodalembedding -multitool -mvn -mvnw -myaccount -mydb -mydomain -myprojectid -myvertexdatastoreid -n'ordonne -naissent -Nakoso -nanswer -Narnia -nas -naturels -nazione -nbconvert -nbformat -nbqa -nce -ncols -ndarray -NDCG -neces -netif -networkmanagement -Neue -nio -nlp -nltk -nobserved -nodularis -Nominatim -norigin -noticeab -noxfile -nrows -ntheory -nuisibles -nuit -nvidia -Nyquil -ocr -ODb -OLAP -oliv -olleh -openai -openfda -Orbelians -ordre -orgpolicy -ori -originalname -oslogin -osm -OTCH -owlbot -pagemap -Pakeman -paleo -pancetta -Paolini -Paquete -parcoords -Pati -payslip -paystub -PDEs -pdfminer -pdfplumber -pdfs -peines -personne -petabytes -peut -peuvent -pgadmin -PGDATABASE -PGHOST -PGPORT -PGUSER -pgvector -photorealistic -Pichai -pii -pincodes -pixmap -pkl -plac -playlists -plc -plotly -PLOTLYENV -plpgsql -pls -plt -poissons -posargs -posso -postgres -postgresql -pourvu -pouvoir -prcntg -preds -prepari -prerel -pretrained -prewritten -proactively -Procfile -programar -PROJECTBASEDIR -projectid -proname -protobuf -psa -pstotext -psychographics -Pullum -puni -punisse -pyasn -Pydanitc -pydantic -pydub -pymupdf -pyopenssl -pypdf -pyplot -pytesseract -PYTHONUNBUFFERED -pytorch -pyupgrade -qna -QPM -qthelp -qu'elle -Quaoar -qubit -questa -Qwiklab -ragdemos -raggio -rarian -ratelimit -receieve -recommonmark -regexes -Reimagining -rekenrek -rembg -remoting -REPOURL -reprompt -requestz -reranking -resourced -resourcemanager -resul -Reza -ribeye -ricc -riccardo -RLHF -Roboto -Ruchika -runjdwp -RYDE -Sahel -saisi -Sauron -Sca -scattercarpet -scattergeo -scattergl -scattermapbox -scatterpolar -scatterpolargl -scatterternary -Schlenoff -Schwimmer -sco -screencast -screenshots -seaborn -seatback -Sebben -seby -secretmanager -Sedna -SEK -Selam -selfie -selon -sentenc -seo -seperate -serait -serializinghtml -serviceaccount -servicedirectory -servicenetworking -serviceusage -setlocal -sft -Shklovsky -shortdesc -showlakes -showland -showor -showtime -Shubham -siglap -simage -Siri -sittin -Skaffold -sklearn -sku -slf -smartphone -Smaug -SNB -SNE -snowfakery -sociales -soit -solutionbuilder -sono -sont -Speci -sphinxcontrib -spirito -Sprachen -sprechen -springframework -sqlalchemy -sqlfluff -ssd -ssml -ssn -stackoverflow -stakeholders -stcore -stext -Stic -STIX -stp -streamlit -stru -stt -stylelintrc -Subworkflow -summ -Sundar -Superstore -synthtool -systemtest -sytem -Syunik -TABLESPACE -tagline -tailwindcss -Tast -templatefile -temurin -tensorboard -tensorflow -terraform -testutils -textembedding -texting -textno -texttospeech -tfhub -tftpl -thelook -THREDED -thres -thsoe -tiangolo -tiendrons -tiktoken -timechart -timecode -TLDR -tobytes -Tolkien -tomat -Tomoko -topk -toself -tous -toute -tpu -tqdm -tran -Tribbiani -trustedtester -tsne -tsv -tts -typehints -UBS -UDFs -UEFA -und -Undeploying -undst -unigram -unrtf -Unsplash -uomo -Urs -usebackq -usecases -Utik -uvicorn -vais -Vayots -VDF -vectoral -vectorsearch -vectorstore -vedi -Vergin -Verilog -vertexai -vertexdatastoreid -viai -viewcode -Vodafone -vous -vpcaccess -VQA -VSC -vtpm -vulnz -wdir -webclient -webinar -webpage -websites -Wehn -weil -welcom -Wellcare -werkzeug -wikilingua -wikipediaapi -wil -Willibald -wip -WORKDIR -wth -xaxes -xaxis -xdg -Xferd -xlabel -xsi -Xsrf -xsum -xticks -xxxxxxx -xxxxxxxx -xxxxxxxxxx -yaxes -yaxis -yeux -ylabel -yourselfers -youtube -ytd -yticks -zakarid -zaxis -zdq -Zom -Zootopia -Zscaler -Zuercher From 5ee7d80028a6a13b75efdafec25838f1fdb783d3 Mon Sep 17 00:00:00 2001 From: Kartik Chaudhary Date: Wed, 18 Sep 2024 21:40:28 +0530 Subject: [PATCH 05/10] feat: add moviepy, fontdict to allow.txt (#1131) --- .github/actions/spelling/allow.txt | 2 ++ 1 file changed, 2 insertions(+) diff --git a/.github/actions/spelling/allow.txt b/.github/actions/spelling/allow.txt index 92842866971..6dc93eb8c97 100644 --- a/.github/actions/spelling/allow.txt +++ b/.github/actions/spelling/allow.txt @@ -343,6 +343,7 @@ ffi figsize fillmode flac +fontdict forno freedraw freopen @@ -441,6 +442,7 @@ metadatas mgrs miranda morty +moviepy mpn mrr nbconvert From cbc49cc90875655dcd26647835edfcb41c6b2caf Mon Sep 17 00:00:00 2001 From: Holt Skinner <13262395+holtskinner@users.noreply.github.com> Date: Wed, 18 Sep 2024 11:18:32 -0500 Subject: [PATCH 06/10] ci: Add Autoflake to `nox -s format` (#1100) # Description Ran autoflake and pyupgrade --------- Co-authored-by: Owl Bot --- .github/actions/spelling/allow.txt | 19 ++++++ .github/actions/spelling/excludes.txt | 2 + .../workflows/issue_assigner/assign_issue.py | 2 +- .github/workflows/update_notebook_links.py | 5 +- gemini/function-calling/sql-talk-app/app.py | 4 +- .../app/pages_utils/downloads.py | 2 +- .../app/pages_utils/edit_image.py | 2 +- .../app/pages_utils/imagen.py | 3 +- .../app/pages_utils/insights.py | 3 +- .../cloud_functions/gemini_call/main.py | 6 +- .../cloud_functions/imagen_call/main.py | 4 +- .../cloud_functions/text_embedding/main.py | 4 +- .../pages/3_Graph_Visualization.py | 2 +- .../fixmycar/frontend/streamlit-backend.py | 2 +- .../gemini-streamlit-cloudrun/app.py | 5 +- .../function-scripts/process-pdf/main.py | 17 +++-- .../update-search-index/main.py | 7 +-- ...0\237\227\204\357\270\217 Data Sources.py" | 2 +- .../photo-discovery/ag-web/app/app.py | 7 +-- .../utils/intro_multimodal_rag_utils.py | 63 ++++++++++--------- .../document-qa/utils/matching_engine.py | 59 ++++++++--------- .../utils/matching_engine_utils.py | 5 +- noxfile.py | 49 +++++++++------ owlbot.py | 2 +- search/cloud-function/python/main.py | 8 +-- .../test_integration_vertex_search_client.py | 2 +- .../python/vertex_ai_search_client.py | 28 ++++----- search/web-app/ekg_utils.py | 12 ++-- search/web-app/genappbuilder_utils.py | 33 +++++----- 29 files changed, 190 insertions(+), 169 deletions(-) diff --git a/.github/actions/spelling/allow.txt b/.github/actions/spelling/allow.txt index 6dc93eb8c97..f9346345d34 100644 --- a/.github/actions/spelling/allow.txt +++ b/.github/actions/spelling/allow.txt @@ -107,6 +107,7 @@ Jang Jedi Joji KNN +KPIs Kaelen Kaggle Kamradt @@ -248,6 +249,7 @@ Womens XXE Yuzuru Zijin +Zom Zscaler Zuercher aadd @@ -261,6 +263,7 @@ afrom agentic ainit ainvoke +aip airlume alloydb antiword @@ -271,6 +274,7 @@ arXiv aretrieve arun astype +autoflake autogen automl autoptr @@ -306,6 +310,7 @@ colwidth constexpr corpuses csa +cse cupertino dask dataframe @@ -320,6 +325,7 @@ deskmates dino diy docai +docstore dpi draig drinkware @@ -328,7 +334,9 @@ dsl dtypes dwmapi ecommerce +ekg elous +emb embs emojis ename @@ -337,17 +345,20 @@ etf eur evals faiss +fastapi fect fewshot ffi figsize fillmode +firestore flac fontdict forno freedraw freopen fromarray +fromiter fts fulltext funtion @@ -395,6 +406,7 @@ idk idks idxs iloc +imageno imdb imshow iostream @@ -406,6 +418,7 @@ itables iterrows jegadesh jetbrains +jsonify jupyter kaggle kenleejr @@ -466,6 +479,7 @@ onesies osx owlbot oxml +pagemap paleo pancetta pantarba @@ -497,6 +511,7 @@ projectid protobuf pstotext pubspec +putdata pvc pyautogen pybind @@ -552,6 +567,7 @@ sxs tagline tencel termcolor +textno tfhub tfidf tgz @@ -559,14 +575,17 @@ thelook tiktoken timechart titlebar +tobytes toself tqdm tritan ubuntu +undst unigram unrtf upsell urandom +usecases username usernames uvb diff --git a/.github/actions/spelling/excludes.txt b/.github/actions/spelling/excludes.txt index 02d685a97ac..b551f438713 100644 --- a/.github/actions/spelling/excludes.txt +++ b/.github/actions/spelling/excludes.txt @@ -107,3 +107,5 @@ ignore$ ^\Qowlbot.py\E$ ^\Qsearch/bulk-question-answering/bulk_question_answering_output.tsv\E$ ^\Q.github/workflows/issue_assigner/assign_issue.py\E$ +^\Qnoxfile.py\E$ +^\owlbot.py\E$ diff --git a/.github/workflows/issue_assigner/assign_issue.py b/.github/workflows/issue_assigner/assign_issue.py index e489f31e4d8..3360d20be7b 100644 --- a/.github/workflows/issue_assigner/assign_issue.py +++ b/.github/workflows/issue_assigner/assign_issue.py @@ -12,7 +12,7 @@ def get_issue_number(event_path: str) -> int: """Retrieves the issue number from GitHub event data.""" # Load event data - with open(event_path, "r", encoding="utf-8") as f: + with open(event_path, encoding="utf-8") as f: event_data = json.load(f) # Determine the issue number based on the event diff --git a/.github/workflows/update_notebook_links.py b/.github/workflows/update_notebook_links.py index b8f75e2357d..c9d7852f976 100644 --- a/.github/workflows/update_notebook_links.py +++ b/.github/workflows/update_notebook_links.py @@ -2,7 +2,6 @@ import os import sys -from typing import Tuple import urllib.parse import nbformat @@ -21,7 +20,7 @@ def fix_markdown_links( cell_source: str, relative_notebook_path: str -) -> Tuple[str, bool]: +) -> tuple[str, bool]: """Fixes links in a markdown cell and returns the updated source.""" new_lines = [] changes_made = False @@ -58,7 +57,7 @@ def fix_markdown_links( def fix_links_in_notebook(notebook_path: str) -> int: """Fixes specific types of links in a Jupyter notebook.""" - with open(notebook_path, "r", encoding="utf-8") as f: + with open(notebook_path, encoding="utf-8") as f: notebook = nbformat.read(f, as_version=4) relative_notebook_path = os.path.relpath(notebook_path, start=os.getcwd()).lower() diff --git a/gemini/function-calling/sql-talk-app/app.py b/gemini/function-calling/sql-talk-app/app.py index b1be8651967..c7ecda501b0 100644 --- a/gemini/function-calling/sql-talk-app/app.py +++ b/gemini/function-calling/sql-talk-app/app.py @@ -115,7 +115,7 @@ for message in st.session_state.messages: with st.chat_message(message["role"]): - st.markdown(message["content"].replace("$", "\$")) # noqa: W605 + st.markdown(message["content"].replace("$", r"\$")) # noqa: W605 try: with st.expander("Function calls, parameters, and responses"): st.markdown(message["backend_details"]) @@ -257,7 +257,7 @@ full_response = response.text with message_placeholder.container(): - st.markdown(full_response.replace("$", "\$")) # noqa: W605 + st.markdown(full_response.replace("$", r"\$")) # noqa: W605 with st.expander("Function calls, parameters, and responses:"): st.markdown(backend_details) diff --git a/gemini/sample-apps/accelerating_product_innovation/app/pages_utils/downloads.py b/gemini/sample-apps/accelerating_product_innovation/app/pages_utils/downloads.py index 3eb03b88b41..66f62c6ffa4 100644 --- a/gemini/sample-apps/accelerating_product_innovation/app/pages_utils/downloads.py +++ b/gemini/sample-apps/accelerating_product_innovation/app/pages_utils/downloads.py @@ -90,7 +90,7 @@ def download_button(object_to_download: bytes, download_filename: str) -> str: b64 = base64.b64encode(zip_content).decode() # Read the HTML template file - with open("app/download_link.html", "r", encoding="utf8") as f: + with open("app/download_link.html", encoding="utf8") as f: html_template = f.read() # Replace placeholders in the HTML template diff --git a/gemini/sample-apps/accelerating_product_innovation/app/pages_utils/edit_image.py b/gemini/sample-apps/accelerating_product_innovation/app/pages_utils/edit_image.py index 86a2ceacfa7..e37a5640ff5 100644 --- a/gemini/sample-apps/accelerating_product_innovation/app/pages_utils/edit_image.py +++ b/gemini/sample-apps/accelerating_product_innovation/app/pages_utils/edit_image.py @@ -121,7 +121,7 @@ def handle_image_upload() -> None: filename = "uploaded_image0.png" image.save(filename) st.session_state.start_editing = True - except (IOError, PIL.UnidentifiedImageError) as e: + except (OSError, PIL.UnidentifiedImageError) as e: st.error(f"Error opening image: {e}") diff --git a/gemini/sample-apps/accelerating_product_innovation/app/pages_utils/imagen.py b/gemini/sample-apps/accelerating_product_innovation/app/pages_utils/imagen.py index 61e40db36a4..e5d9de6181c 100644 --- a/gemini/sample-apps/accelerating_product_innovation/app/pages_utils/imagen.py +++ b/gemini/sample-apps/accelerating_product_innovation/app/pages_utils/imagen.py @@ -10,7 +10,6 @@ import json import logging import os -from typing import Optional from PIL import Image import aiohttp as cloud_function_call @@ -87,7 +86,7 @@ def image_generation( images[0].save(location=f"{filename}.png", include_generation_parameters=False) -async def parallel_image_generation(prompt: str, col: int) -> Optional[Image.Image]: +async def parallel_image_generation(prompt: str, col: int) -> Image.Image | None: """ Executes parallel generation of images through Imagen. diff --git a/gemini/sample-apps/accelerating_product_innovation/app/pages_utils/insights.py b/gemini/sample-apps/accelerating_product_innovation/app/pages_utils/insights.py index 9419d1a6b9e..e94af3c98e9 100644 --- a/gemini/sample-apps/accelerating_product_innovation/app/pages_utils/insights.py +++ b/gemini/sample-apps/accelerating_product_innovation/app/pages_utils/insights.py @@ -11,7 +11,6 @@ import json import os import re -from typing import Optional from app.pages_utils.embedding_model import embedding_model_with_backoff from app.pages_utils.get_llm_response import generate_gemini @@ -77,7 +76,7 @@ def get_suggestions(state_key: str) -> None: st.session_state[state_key] = extract_bullet_points(gen_suggestions) -def get_stored_embeddings_as_df() -> Optional[pd.DataFrame]: +def get_stored_embeddings_as_df() -> pd.DataFrame | None: """Retrieves and processes stored embeddings from cloud storage. Returns: diff --git a/gemini/sample-apps/accelerating_product_innovation/cloud_functions/gemini_call/main.py b/gemini/sample-apps/accelerating_product_innovation/cloud_functions/gemini_call/main.py index 901c3edd092..b1224677278 100644 --- a/gemini/sample-apps/accelerating_product_innovation/cloud_functions/gemini_call/main.py +++ b/gemini/sample-apps/accelerating_product_innovation/cloud_functions/gemini_call/main.py @@ -3,7 +3,7 @@ """ import os -from typing import Any, Dict, Tuple, Union +from typing import Any from dotenv import load_dotenv import functions_framework @@ -41,7 +41,7 @@ def generate_text(prompt: str) -> str: @functions_framework.http -def get_llm_response(request: Any) -> Union[Dict, Tuple]: +def get_llm_response(request: Any) -> dict | tuple: """HTTP Cloud Function that generates text using the Gemini-Pro model. Args: @@ -53,7 +53,7 @@ def get_llm_response(request: Any) -> Union[Dict, Tuple]: Response object using `make_response` . """ - request_json: Dict = request.get_json(silent=True) + request_json: dict = request.get_json(silent=True) if not request_json or "text_prompt" not in request_json: return {"error": "Request body must contain 'text_prompt' field."}, 400 diff --git a/gemini/sample-apps/accelerating_product_innovation/cloud_functions/imagen_call/main.py b/gemini/sample-apps/accelerating_product_innovation/cloud_functions/imagen_call/main.py index decf23478b5..a04df1badce 100644 --- a/gemini/sample-apps/accelerating_product_innovation/cloud_functions/imagen_call/main.py +++ b/gemini/sample-apps/accelerating_product_innovation/cloud_functions/imagen_call/main.py @@ -3,7 +3,7 @@ """ import os -from typing import Any, Dict +from typing import Any from dotenv import load_dotenv import functions_framework @@ -46,5 +46,5 @@ def get_images(request: Any) -> bytes: Returns: Response: A Flask Response object containing the generated image. """ - request_json: Dict = request.get_json(silent=True) + request_json: dict = request.get_json(silent=True) return image_generation(request_json["img_prompt"]) diff --git a/gemini/sample-apps/accelerating_product_innovation/cloud_functions/text_embedding/main.py b/gemini/sample-apps/accelerating_product_innovation/cloud_functions/text_embedding/main.py index f46216b4dff..2c9de8ee79b 100644 --- a/gemini/sample-apps/accelerating_product_innovation/cloud_functions/text_embedding/main.py +++ b/gemini/sample-apps/accelerating_product_innovation/cloud_functions/text_embedding/main.py @@ -4,7 +4,7 @@ import json import os -from typing import Any, List +from typing import Any from dotenv import load_dotenv import functions_framework @@ -20,7 +20,7 @@ embedding_model = TextEmbeddingModel.from_pretrained("textembedding-gecko@003") -def get_embeddings(instances: list[str]) -> List[List[float]]: +def get_embeddings(instances: list[str]) -> list[list[float]]: """ Generates embeddings for given text. diff --git a/gemini/sample-apps/finance-advisor-spanner/pages/3_Graph_Visualization.py b/gemini/sample-apps/finance-advisor-spanner/pages/3_Graph_Visualization.py index a9b48f724b3..7e28ea4f50d 100644 --- a/gemini/sample-apps/finance-advisor-spanner/pages/3_Graph_Visualization.py +++ b/gemini/sample-apps/finance-advisor-spanner/pages/3_Graph_Visualization.py @@ -13,7 +13,7 @@ ) graph_viz.generate_graph() -with open("graph_viz.html", "r", encoding="utf-8") as html_file: +with open("graph_viz.html", encoding="utf-8") as html_file: source_code = html_file.read() components.html(source_code, height=950, width=900) diff --git a/gemini/sample-apps/fixmycar/frontend/streamlit-backend.py b/gemini/sample-apps/fixmycar/frontend/streamlit-backend.py index c7220f2c15d..810164eb562 100644 --- a/gemini/sample-apps/fixmycar/frontend/streamlit-backend.py +++ b/gemini/sample-apps/fixmycar/frontend/streamlit-backend.py @@ -12,7 +12,7 @@ def get_chat_response(user_prompt: str, messages: []) -> str: request = {"prompt": user_prompt} response = requests.post(backend_url + "/chat", json=request) if response.status_code != 200: - raise Exception("Bad response from backend: {}".format(response.text)) + raise Exception(f"Bad response from backend: {response.text}") return response.json()["response"] diff --git a/gemini/sample-apps/gemini-streamlit-cloudrun/app.py b/gemini/sample-apps/gemini-streamlit-cloudrun/app.py index 39925ef5af5..8c20fe6ad48 100644 --- a/gemini/sample-apps/gemini-streamlit-cloudrun/app.py +++ b/gemini/sample-apps/gemini-streamlit-cloudrun/app.py @@ -4,7 +4,6 @@ """ import os -from typing import List, Tuple, Union import streamlit as st import vertexai @@ -23,14 +22,14 @@ @st.cache_resource -def load_models() -> Tuple[GenerativeModel, GenerativeModel]: +def load_models() -> tuple[GenerativeModel, GenerativeModel]: """Load Gemini 1.5 Flash and Pro models.""" return GenerativeModel("gemini-1.5-flash"), GenerativeModel("gemini-1.5-pro") def get_gemini_response( model: GenerativeModel, - contents: Union[str, List], + contents: str | list, generation_config: GenerationConfig = GenerationConfig( temperature=0.1, max_output_tokens=2048 ), diff --git a/gemini/sample-apps/genwealth/function-scripts/process-pdf/main.py b/gemini/sample-apps/genwealth/function-scripts/process-pdf/main.py index 2ce653e792e..3bdefbc6f3b 100644 --- a/gemini/sample-apps/genwealth/function-scripts/process-pdf/main.py +++ b/gemini/sample-apps/genwealth/function-scripts/process-pdf/main.py @@ -3,7 +3,6 @@ import os from pathlib import Path import re -from typing import List, Optional import uuid import functions_framework @@ -23,13 +22,13 @@ def batch_process_documents( location: str, processor_id: str, gcs_output_uri: str, - processor_version_id: Optional[str] = None, - gcs_input_uri: Optional[str] = None, - input_mime_type: Optional[str] = None, - gcs_input_prefix: Optional[str] = None, - field_mask: Optional[str] = None, + processor_version_id: str | None = None, + gcs_input_uri: str | None = None, + input_mime_type: str | None = None, + gcs_input_prefix: str | None = None, + field_mask: str | None = None, timeout: int = 400, -) -> List[storage.Blob]: +) -> list[storage.Blob]: """Function to batch process documents""" # You must set the `api_endpoint` if you use a location other than "us". opts = ClientOptions(api_endpoint=f"{location}-documentai.googleapis.com") @@ -298,8 +297,6 @@ def process_pdf(cloud_event): ticker = Path(source_file).stem publisher = pubsub_v1.PublisherClient() topic_name = f"projects/{project_id}/topics/{project_id}-doc-ready" - future = publisher.publish( - topic_name, bytes(f"{ticker}".encode("utf-8")), spam="done" - ) + future = publisher.publish(topic_name, bytes(f"{ticker}".encode()), spam="done") future.result() print("Sent message to pubsub") diff --git a/gemini/sample-apps/genwealth/function-scripts/update-search-index/main.py b/gemini/sample-apps/genwealth/function-scripts/update-search-index/main.py index a041f517776..54492ce1f09 100644 --- a/gemini/sample-apps/genwealth/function-scripts/update-search-index/main.py +++ b/gemini/sample-apps/genwealth/function-scripts/update-search-index/main.py @@ -1,7 +1,6 @@ """Function to update the Vertex AI Search and Conversion index""" import os -from typing import Optional import functions_framework from google.api_core.client_options import ClientOptions @@ -12,9 +11,9 @@ def import_documents_sample( project_id: str, location: str, data_store_id: str, - gcs_uri: Optional[str] = None, - bigquery_dataset: Optional[str] = None, - bigquery_table: Optional[str] = None, + gcs_uri: str | None = None, + bigquery_dataset: str | None = None, + bigquery_table: str | None = None, ) -> str: """Function to import documents""" # For more information, refer to: diff --git "a/gemini/sample-apps/llamaindex-rag/ui/pages/1_\360\237\227\204\357\270\217 Data Sources.py" "b/gemini/sample-apps/llamaindex-rag/ui/pages/1_\360\237\227\204\357\270\217 Data Sources.py" index 9808a587f67..1a05fe87e30 100644 --- "a/gemini/sample-apps/llamaindex-rag/ui/pages/1_\360\237\227\204\357\270\217 Data Sources.py" +++ "b/gemini/sample-apps/llamaindex-rag/ui/pages/1_\360\237\227\204\357\270\217 Data Sources.py" @@ -77,7 +77,7 @@ def update_index( }, ) if response.status_code == 200: - st.success(f"Updated data source(s) successfully!") + st.success("Updated data source(s) successfully!") else: st.error("Error updating index.") diff --git a/gemini/sample-apps/photo-discovery/ag-web/app/app.py b/gemini/sample-apps/photo-discovery/ag-web/app/app.py index 4336f4a64fc..9cad5e75154 100644 --- a/gemini/sample-apps/photo-discovery/ag-web/app/app.py +++ b/gemini/sample-apps/photo-discovery/ag-web/app/app.py @@ -14,9 +14,6 @@ import json import os -import re - -import requests # # Reasoning Engine @@ -45,14 +42,12 @@ SEARCH_ENGINE_ID = "" -search_client_options = ClientOptions(api_endpoint=f"us-discoveryengine.googleapis.com") +search_client_options = ClientOptions(api_endpoint="us-discoveryengine.googleapis.com") search_client = discoveryengine.SearchServiceClient( client_options=search_client_options ) search_serving_config = f"projects/{PROJECT_ID}/locations/us/collections/default_collection/dataStores/{SEARCH_ENGINE_ID}/servingConfigs/default_search:search" -import json - def search_gms(search_query, rows): # build a search request diff --git a/gemini/use-cases/retrieval-augmented-generation/utils/intro_multimodal_rag_utils.py b/gemini/use-cases/retrieval-augmented-generation/utils/intro_multimodal_rag_utils.py index 926a2b76afe..42d99dff211 100644 --- a/gemini/use-cases/retrieval-augmented-generation/utils/intro_multimodal_rag_utils.py +++ b/gemini/use-cases/retrieval-augmented-generation/utils/intro_multimodal_rag_utils.py @@ -1,7 +1,8 @@ +from collections.abc import Iterable import glob import os import time -from typing import Any, Dict, Iterable, List, Optional, Tuple, Union +from typing import Any from IPython.display import display import PIL @@ -30,7 +31,7 @@ def get_text_embedding_from_text_embedding_model( text: str, - return_array: Optional[bool] = False, + return_array: bool | None = False, ) -> list: """ Generates a numerical text embedding from a provided text input using a text embedding model. @@ -58,8 +59,8 @@ def get_text_embedding_from_text_embedding_model( def get_image_embedding_from_multimodal_embedding_model( image_uri: str, embedding_size: int = 512, - text: Optional[str] = None, - return_array: Optional[bool] = False, + text: str | None = None, + return_array: bool | None = False, ) -> list: """Extracts an image embedding from a multimodal embedding model. The function can optionally utilize contextual text to refine the embedding. @@ -129,7 +130,7 @@ def get_text_overlapping_chunk( return chunked_text_dict -def get_page_text_embedding(text_data: Union[dict, str]) -> dict: +def get_page_text_embedding(text_data: dict | str) -> dict: """ * Generates embeddings for each text chunk using a specified embedding model. * Takes a dictionary of text chunks and an embedding size as input. @@ -219,7 +220,7 @@ def get_image_for_gemini( image_save_dir: str, file_name: str, page_num: int, -) -> Tuple[Image, str]: +) -> tuple[Image, str]: """ Extracts an image from a PDF document, converts it to JPEG format, saves it to a specified directory, and loads it as a PIL Image Object. @@ -260,12 +261,12 @@ def get_image_for_gemini( def get_gemini_response( generative_multimodal_model, - model_input: List[str], + model_input: list[str], stream: bool = True, - generation_config: Optional[GenerationConfig] = GenerationConfig( - temperature=0.2, max_output_tokens=2048 - ), - safety_settings: Optional[dict] = { + generation_config: GenerationConfig + | None = GenerationConfig(temperature=0.2, max_output_tokens=2048), + safety_settings: dict + | None = { HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_NONE, HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_NONE, HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_NONE, @@ -306,7 +307,7 @@ def get_gemini_response( def get_text_metadata_df( - filename: str, text_metadata: Dict[Union[int, str], Dict] + filename: str, text_metadata: dict[int | str, dict] ) -> pd.DataFrame: """ This function takes a filename and a text metadata dictionary as input, @@ -322,11 +323,11 @@ def get_text_metadata_df( A Pandas DataFrame with the extracted text, chunk text, and chunk embeddings for each page. """ - final_data_text: List[Dict] = [] + final_data_text: list[dict] = [] for key, values in text_metadata.items(): for chunk_number, chunk_text in values["chunked_text_dict"].items(): - data: Dict = {} + data: dict = {} data["file_name"] = filename data["page_num"] = int(key) + 1 data["text"] = values["text"] @@ -345,7 +346,7 @@ def get_text_metadata_df( def get_image_metadata_df( - filename: str, image_metadata: Dict[Union[int, str], Dict] + filename: str, image_metadata: dict[int | str, dict] ) -> pd.DataFrame: """ This function takes a filename and an image metadata dictionary as input, @@ -361,10 +362,10 @@ def get_image_metadata_df( A Pandas DataFrame with the extracted image path, image description, and image embeddings for each image. """ - final_data_image: List[Dict] = [] + final_data_image: list[dict] = [] for key, values in image_metadata.items(): for _, image_values in values.items(): - data: Dict = {} + data: dict = {} data["file_name"] = filename data["page_num"] = int(key) + 1 data["img_num"] = int(image_values["img_num"]) @@ -392,10 +393,10 @@ def get_document_metadata( image_save_dir: str, image_description_prompt: str, embedding_size: int = 128, - generation_config: Optional[GenerationConfig] = GenerationConfig( - temperature=0.2, max_output_tokens=2048 - ), - safety_settings: Optional[dict] = { + generation_config: GenerationConfig + | None = GenerationConfig(temperature=0.2, max_output_tokens=2048), + safety_settings: dict + | None = { HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_NONE, HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_NONE, HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_NONE, @@ -403,7 +404,7 @@ def get_document_metadata( }, add_sleep_after_page: bool = False, sleep_time_after_page: int = 2, -) -> Tuple[pd.DataFrame, pd.DataFrame]: +) -> tuple[pd.DataFrame, pd.DataFrame]: """ This function takes a PDF path, an image save directory, an image description prompt, an embedding size, and a text embedding text limit as input. @@ -435,8 +436,8 @@ def get_document_metadata( file_name = pdf_path.split("/")[-1] - text_metadata: Dict[Union[int, str], Dict] = {} - image_metadata: Dict[Union[int, str], Dict] = {} + text_metadata: dict[int | str, dict] = {} + image_metadata: dict[int | str, dict] = {} for page_num, page in enumerate(doc): print(f"Processing page: {page_num + 1}") @@ -582,7 +583,7 @@ def get_cosine_score( def print_text_to_image_citation( - final_images: Dict[int, Dict[str, Any]], print_top: bool = True + final_images: dict[int, dict[str, Any]], print_top: bool = True ) -> None: """ Prints a formatted citation for each matched image in a dictionary. @@ -633,7 +634,7 @@ def print_text_to_image_citation( def print_text_to_text_citation( - final_text: Dict[int, Dict[str, Any]], + final_text: dict[int, dict[str, Any]], print_top: bool = True, chunk_text: bool = True, ) -> None: @@ -694,7 +695,7 @@ def get_similar_image_from_query( image_emb: bool = True, top_n: int = 3, embedding_size: int = 128, -) -> Dict[int, Dict[str, Any]]: +) -> dict[int, dict[str, Any]]: """ Finds the top N most similar images from a metadata DataFrame based on a text query or an image query. @@ -737,7 +738,7 @@ def get_similar_image_from_query( top_n_cosine_values = cosine_scores.nlargest(top_n).values.tolist() # Create a dictionary to store matched images and their information - final_images: Dict[int, Dict[str, Any]] = {} + final_images: dict[int, dict[str, Any]] = {} for matched_imageno, indexvalue in enumerate(top_n_cosine_scores): # Create a sub-dictionary for each matched image @@ -798,7 +799,7 @@ def get_similar_text_from_query( top_n: int = 3, chunk_text: bool = True, print_citation: bool = False, -) -> Dict[int, Dict[str, Any]]: +) -> dict[int, dict[str, Any]]: """ Finds the top N most similar text passages from a metadata DataFrame based on a text query. @@ -838,7 +839,7 @@ def get_similar_text_from_query( top_n_scores = cosine_scores.nlargest(top_n).values.tolist() # Create a dictionary to store matched text and their information - final_text: Dict[int, Dict[str, Any]] = {} + final_text: dict[int, dict[str, Any]] = {} for matched_textno, index in enumerate(top_n_indices): # Create a sub-dictionary for each matched text @@ -879,7 +880,7 @@ def get_similar_text_from_query( def display_images( - images: Iterable[Union[str, PIL.Image.Image]], resize_ratio: float = 0.5 + images: Iterable[str | PIL.Image.Image], resize_ratio: float = 0.5 ) -> None: """ Displays a series of images provided as paths or PIL Image objects. diff --git a/language/use-cases/document-qa/utils/matching_engine.py b/language/use-cases/document-qa/utils/matching_engine.py index cc32826b35c..00ff34838ee 100644 --- a/language/use-cases/document-qa/utils/matching_engine.py +++ b/language/use-cases/document-qa/utils/matching_engine.py @@ -2,9 +2,10 @@ from __future__ import annotations +from collections.abc import Iterable import json import logging -from typing import Any, Iterable, List, Optional, Type +from typing import Any import uuid import google.auth @@ -47,7 +48,7 @@ def __init__( index_client: aiplatform_v1.IndexServiceClient, index_endpoint_client: aiplatform_v1.IndexEndpointServiceClient, gcs_bucket_name: str, - credentials: Credentials = None, + credentials: Credentials | None = None, ): """Vertex AI Matching Engine implementation of the vector store. @@ -106,9 +107,9 @@ def _validate_google_libraries_installation(self) -> None: def add_texts( self, texts: Iterable[str], - metadatas: Optional[Iterable[dict]], + metadatas: Iterable[dict] | None, **kwargs: Any, - ) -> List[str]: + ) -> list[str]: """Run more texts through the embeddings and add to the vectorstore. Args: @@ -169,7 +170,7 @@ def _upload_to_gcs(self, data: str, gcs_location: str) -> None: def get_matches( self, - embeddings: List[str], + embeddings: list[str], n_matches: int, index_endpoint: MatchingEngineIndexEndpoint, filters: dict, @@ -214,7 +215,7 @@ def similarity_search( search_distance: float = 0.65, filters={}, **kwargs: Any, - ) -> List[Document]: + ) -> list[Document]: """Return docs most similar to query. Args: @@ -314,12 +315,12 @@ def _download_from_gcs(self, gcs_location: str) -> str: @classmethod def from_texts( - cls: Type["MatchingEngine"], - texts: List[str], + cls: type[MatchingEngine], + texts: list[str], embedding: Embeddings, - metadatas: Optional[List[dict]] = None, + metadatas: list[dict] | None = None, **kwargs: Any, - ) -> "MatchingEngine": + ) -> MatchingEngine: """Use from components instead.""" raise NotImplementedError( "This method is not implemented. Instead, you should initialize the class" @@ -329,12 +330,12 @@ def from_texts( @classmethod def from_documents( - cls: Type["MatchingEngine"], - documents: List[str], + cls: type[MatchingEngine], + documents: list[str], embedding: Embeddings, - metadatas: Optional[List[dict]] = None, + metadatas: list[dict] | None = None, **kwargs: Any, - ) -> "MatchingEngine": + ) -> MatchingEngine: """Use from components instead.""" raise NotImplementedError( "This method is not implemented. Instead, you should initialize the class" @@ -344,15 +345,15 @@ def from_documents( @classmethod def from_components( - cls: Type["MatchingEngine"], + cls: type[MatchingEngine], project_id: str, region: str, gcs_bucket_name: str, index_id: str, endpoint_id: str, - credentials_path: Optional[str] = None, - embedding: Optional[Embeddings] = None, - ) -> "MatchingEngine": + credentials_path: str | None = None, + embedding: Embeddings | None = None, + ) -> MatchingEngine: """Takes the object creation out of the constructor. Args: @@ -427,8 +428,8 @@ def _validate_gcs_bucket(cls, gcs_bucket_name: str) -> str: @classmethod def _create_credentials_from_file( - cls, json_credentials_path: Optional[str] - ) -> Optional[Credentials]: + cls, json_credentials_path: str | None + ) -> Credentials | None: """Creates credentials for Google Cloud. Args: @@ -452,7 +453,7 @@ def _create_credentials_from_file( @classmethod def _create_index_by_id( - cls, index_id: str, project_id: str, region: str, credentials: "Credentials" + cls, index_id: str, project_id: str, region: str, credentials: Credentials ) -> MatchingEngineIndex: """Creates a MatchingEngineIndex object by id. @@ -472,7 +473,7 @@ def _create_index_by_id( @classmethod def _create_endpoint_by_id( - cls, endpoint_id: str, project_id: str, region: str, credentials: "Credentials" + cls, endpoint_id: str, project_id: str, region: str, credentials: Credentials ) -> MatchingEngineIndexEndpoint: """Creates a MatchingEngineIndexEndpoint object by id. @@ -498,8 +499,8 @@ def _create_endpoint_by_id( @classmethod def _get_gcs_client( - cls, credentials: "Credentials", project_id: str - ) -> "storage.Client": + cls, credentials: Credentials, project_id: str + ) -> storage.Client: """Lazily creates a GCS client. Returns: @@ -512,8 +513,8 @@ def _get_gcs_client( @classmethod def _get_index_client( - cls, project_id: str, region: str, credentials: "Credentials" - ) -> "storage.Client": + cls, project_id: str, region: str, credentials: Credentials + ) -> storage.Client: """Lazily creates a Matching Engine Index client. Returns: @@ -530,8 +531,8 @@ def _get_index_client( @classmethod def _get_index_endpoint_client( - cls, project_id: str, region: str, credentials: "Credentials" - ) -> "storage.Client": + cls, project_id: str, region: str, credentials: Credentials + ) -> storage.Client: """Lazily creates a Matching Engine Index Endpoint client. Returns: @@ -552,7 +553,7 @@ def _init_aiplatform( project_id: str, region: str, gcs_bucket_name: str, - credentials: "Credentials", + credentials: Credentials, ) -> None: """Configures the aiplatform library. diff --git a/language/use-cases/document-qa/utils/matching_engine_utils.py b/language/use-cases/document-qa/utils/matching_engine_utils.py index 6e5f5385dab..b1478c0b4fa 100644 --- a/language/use-cases/document-qa/utils/matching_engine_utils.py +++ b/language/use-cases/document-qa/utils/matching_engine_utils.py @@ -2,7 +2,6 @@ from datetime import datetime import logging import time -from typing import Optional from google.api_core.client_options import ClientOptions from google.cloud import aiplatform_v1 as aipv1 @@ -18,7 +17,7 @@ def __init__( project_id: str, region: str, index_name: str, - index_endpoint_name: Optional[str] = None, + index_endpoint_name: str | None = None, ): self.project_id = project_id self.region = region @@ -167,7 +166,7 @@ def deploy_index( min_replica_count: int = 2, max_replica_count: int = 10, public_endpoint_enabled: bool = True, - network: Optional[str] = None, + network: str | None = None, ): try: # Get index if exists diff --git a/noxfile.py b/noxfile.py index beaf82f205d..1ef53e1cea3 100644 --- a/noxfile.py +++ b/noxfile.py @@ -18,13 +18,11 @@ # Generated by synthtool. DO NOT EDIT! -from __future__ import absolute_import import os import pathlib import re import shutil -from typing import Dict, List import warnings import nox @@ -36,7 +34,7 @@ DEFAULT_PYTHON_VERSION = "3.10" -UNIT_TEST_PYTHON_VERSIONS: List[str] = ["3.10", "3.11", "3.12"] +UNIT_TEST_PYTHON_VERSIONS: list[str] = ["3.10", "3.11", "3.12"] UNIT_TEST_STANDARD_DEPENDENCIES = [ "mock", "asyncmock", @@ -44,23 +42,23 @@ "pytest-cov", "pytest-asyncio", ] -UNIT_TEST_EXTERNAL_DEPENDENCIES: List[str] = [] -UNIT_TEST_LOCAL_DEPENDENCIES: List[str] = [] -UNIT_TEST_DEPENDENCIES: List[str] = [] -UNIT_TEST_EXTRAS: List[str] = [] -UNIT_TEST_EXTRAS_BY_PYTHON: Dict[str, List[str]] = {} - -SYSTEM_TEST_PYTHON_VERSIONS: List[str] = ["3.8"] -SYSTEM_TEST_STANDARD_DEPENDENCIES: List[str] = [ +UNIT_TEST_EXTERNAL_DEPENDENCIES: list[str] = [] +UNIT_TEST_LOCAL_DEPENDENCIES: list[str] = [] +UNIT_TEST_DEPENDENCIES: list[str] = [] +UNIT_TEST_EXTRAS: list[str] = [] +UNIT_TEST_EXTRAS_BY_PYTHON: dict[str, list[str]] = {} + +SYSTEM_TEST_PYTHON_VERSIONS: list[str] = ["3.8"] +SYSTEM_TEST_STANDARD_DEPENDENCIES: list[str] = [ "mock", "pytest", "google-cloud-testutils", ] -SYSTEM_TEST_EXTERNAL_DEPENDENCIES: List[str] = [] -SYSTEM_TEST_LOCAL_DEPENDENCIES: List[str] = [] -SYSTEM_TEST_DEPENDENCIES: List[str] = [] -SYSTEM_TEST_EXTRAS: List[str] = [] -SYSTEM_TEST_EXTRAS_BY_PYTHON: Dict[str, List[str]] = {} +SYSTEM_TEST_EXTERNAL_DEPENDENCIES: list[str] = [] +SYSTEM_TEST_LOCAL_DEPENDENCIES: list[str] = [] +SYSTEM_TEST_DEPENDENCIES: list[str] = [] +SYSTEM_TEST_EXTRAS: list[str] = [] +SYSTEM_TEST_EXTRAS_BY_PYTHON: dict[str, list[str]] = {} CURRENT_DIRECTORY = pathlib.Path(__file__).parent.absolute() @@ -112,9 +110,22 @@ def format(session): Run isort to sort imports. Then run black to format code to uniform standard. """ - session.install(BLACK_VERSION, ISORT_VERSION) + session.install(BLACK_VERSION, ISORT_VERSION, "autoflake", "ruff") # Use the --fss option to sort imports using strict alphabetical order. # See https://pycqa.github.io/isort/docs/configuration/options.html#force-sort-within-sections + session.run( + "autoflake", + "-i", + "-r", + "--remove-all-unused-imports", + *LINT_PATHS, + ) + session.run( + "ruff", + "check", + "--fix-only", + *LINT_PATHS, + ) session.run( "isort", "--fss", @@ -150,7 +161,9 @@ def format_notebooks(session): session.run( "nbqa", "pyupgrade", "--exit-zero-even-if-changed", "--py310-plus", *LINT_PATHS ) - session.run("nbqa", "autoflake", "-i", "--remove-all-unused-imports", *LINT_PATHS) + session.run( + "nbqa", "autoflake", "-i", "--remove-all-unused-imports", "-r", *LINT_PATHS + ) session.run( "nbqa", "isort", diff --git a/owlbot.py b/owlbot.py index 52984db5144..9f1ff224bb5 100644 --- a/owlbot.py +++ b/owlbot.py @@ -39,7 +39,7 @@ # Sort Spelling Allowlist spelling_allow_file = ".github/actions/spelling/allow.txt" -with open(spelling_allow_file, "r", encoding="utf-8") as file: +with open(spelling_allow_file, encoding="utf-8") as file: unique_words = sorted(set(file)) with open(spelling_allow_file, "w", encoding="utf-8") as file: diff --git a/search/cloud-function/python/main.py b/search/cloud-function/python/main.py index f3c1c19fd9b..a0180d8d464 100644 --- a/search/cloud-function/python/main.py +++ b/search/cloud-function/python/main.py @@ -23,7 +23,7 @@ """ import os -from typing import Any, Dict, Tuple +from typing import Any from flask import Flask, Request, jsonify, request import functions_framework @@ -53,7 +53,7 @@ @functions_framework.http -def vertex_ai_search(http_request: Request) -> Tuple[Any, int, Dict[str, str]]: +def vertex_ai_search(http_request: Request) -> tuple[Any, int, dict[str, str]]: """ Handle HTTP requests for Vertex AI Search. @@ -88,7 +88,7 @@ def vertex_ai_search(http_request: Request) -> Tuple[Any, int, Dict[str, str]]: def create_error_response( message: str, status_code: int - ) -> Tuple[Any, int, Dict[str, str]]: + ) -> tuple[Any, int, dict[str, str]]: """Standardize the error responses with common headers.""" return (jsonify({"error": message}), status_code, headers) @@ -119,7 +119,7 @@ def create_error_response( app = Flask(__name__) @app.route("/", methods=["POST"]) - def index() -> Tuple[Any, int, Dict[str, str]]: + def index() -> tuple[Any, int, dict[str, str]]: """ Flask route for handling POST requests when running locally. diff --git a/search/cloud-function/python/test_integration_vertex_search_client.py b/search/cloud-function/python/test_integration_vertex_search_client.py index 3ddfb106133..b6587d4aab3 100644 --- a/search/cloud-function/python/test_integration_vertex_search_client.py +++ b/search/cloud-function/python/test_integration_vertex_search_client.py @@ -19,8 +19,8 @@ environment variables and access to the Vertex AI Search service. """ +from collections.abc import Generator import os -from typing import Generator import pytest from vertex_ai_search_client import VertexAISearchClient, VertexAISearchConfig diff --git a/search/cloud-function/python/vertex_ai_search_client.py b/search/cloud-function/python/vertex_ai_search_client.py index a4a4e82cbe9..720cef64e69 100644 --- a/search/cloud-function/python/vertex_ai_search_client.py +++ b/search/cloud-function/python/vertex_ai_search_client.py @@ -35,7 +35,7 @@ import html import json import re -from typing import Any, Dict, List, Literal, Union +from typing import Any, Literal from google.api_core.client_options import ClientOptions from google.cloud import discoveryengine_v1alpha as discoveryengine @@ -61,9 +61,9 @@ class VertexAISearchConfig: project_id: str location: str data_store_id: str - engine_data_type: Union[EngineDataTypeStr, str] - engine_chunk_type: Union[EngineChunkTypeStr, str] - summary_type: Union[SummaryTypeStr, str] + engine_data_type: EngineDataTypeStr | str + engine_chunk_type: EngineChunkTypeStr | str + summary_type: SummaryTypeStr | str def __post_init__(self) -> None: """Validate and convert string inputs to appropriate types.""" @@ -85,7 +85,7 @@ def _validate_enum(value: str, enum_type: Any, default: str) -> str: print(f"Warning: Invalid value '{value}'. Using default: '{default}'") return default - def to_dict(self) -> Dict[str, str]: + def to_dict(self) -> dict[str, str]: """Convert the config to a dictionary.""" return { "project_id": self.project_id, @@ -144,7 +144,7 @@ def _get_serving_config(self) -> str: serving_config="default_config", ) - def search(self, query: str, page_size: int = 10) -> Dict[str, Any]: + def search(self, query: str, page_size: int = 10) -> dict[str, Any]: """ Perform a search query using Vertex AI Search. @@ -218,7 +218,7 @@ def build_search_request( ), ) - def map_search_pager_to_dict(self, pager: SearchPager) -> Dict[str, Any]: + def map_search_pager_to_dict(self, pager: SearchPager) -> dict[str, Any]: """ Maps a SearchPager to a dictionary structure, iterativly requesting results. @@ -230,7 +230,7 @@ def map_search_pager_to_dict(self, pager: SearchPager) -> Dict[str, Any]: Returns: Dict[str, Any]: A dictionary containing the search results and metadata. """ - output: Dict[str, Any] = { + output: dict[str, Any] = { "results": [ SearchResponse.SearchResult.to_dict(result) for result in pager ], @@ -267,7 +267,7 @@ def map_search_pager_to_dict(self, pager: SearchPager) -> Dict[str, Any]: return output - def simplify_search_results(self, response: Dict[str, Any]) -> Dict[str, Any]: + def simplify_search_results(self, response: dict[str, Any]) -> dict[str, Any]: """ Simplify the search results by parsing documents and chunks. @@ -290,7 +290,7 @@ def simplify_search_results(self, response: Dict[str, Any]) -> Dict[str, Any]: response["simplified_results"] = simplified_results return response - def _parse_document_result(self, document: Dict[str, Any]) -> Dict[str, Any]: + def _parse_document_result(self, document: dict[str, Any]) -> dict[str, Any]: """ Parse a single document result from the search response. @@ -317,7 +317,7 @@ def _parse_document_result(self, document: Dict[str, Any]) -> Dict[str, Any]: json_data = {} metadata.update(json_data) - result: Dict[str, Any] = {"metadata": metadata} + result: dict[str, Any] = {"metadata": metadata} if self.config.engine_data_type == "STRUCTURED": structured_data = ( @@ -337,7 +337,7 @@ def _parse_document_result(self, document: Dict[str, Any]) -> Dict[str, Any]: return result - def _parse_segments(self, segments: List[Dict[str, Any]]) -> str: + def _parse_segments(self, segments: list[dict[str, Any]]) -> str: """ Parse extractive segments from a single document of search results. @@ -361,7 +361,7 @@ def _parse_segments(self, segments: List[Dict[str, Any]]) -> str: for segment in parsed_segments ) - def _parse_snippets(self, snippets: List[Dict[str, Any]]) -> str: + def _parse_snippets(self, snippets: list[dict[str, Any]]) -> str: """ Parse snippets from a single document of search results. @@ -377,7 +377,7 @@ def _parse_snippets(self, snippets: List[Dict[str, Any]]) -> str: if snippet.get("snippetStatus") == "SUCCESS" ) - def _parse_chunk_result(self, chunk: Dict[str, Any]) -> Dict[str, Any]: + def _parse_chunk_result(self, chunk: dict[str, Any]) -> dict[str, Any]: """ Parse a single chunk result from the search response. diff --git a/search/web-app/ekg_utils.py b/search/web-app/ekg_utils.py index 1b823a52444..374500c8a3c 100644 --- a/search/web-app/ekg_utils.py +++ b/search/web-app/ekg_utils.py @@ -13,8 +13,8 @@ # limitations under the License. """Enterprise Knowledge Graph Utilities""" +from collections.abc import Sequence import json -from typing import List, Optional, Sequence, Tuple from google.cloud import enterpriseknowledgegraph as ekg @@ -26,10 +26,10 @@ def search_public_kg( project_id: str, location: str, search_query: str, - languages: Optional[Sequence[str]] = None, - types: Optional[Sequence[str]] = None, - limit: Optional[int] = None, -) -> Tuple: + languages: Sequence[str] | None = None, + types: Sequence[str] | None = None, + limit: int | None = None, +) -> tuple: """ Make API Request to Public Knowledge Graph. """ @@ -58,7 +58,7 @@ def search_public_kg( return entities, request_url, request_json, response_json -def get_entities(response: ekg.SearchPublicKgResponse) -> List: +def get_entities(response: ekg.SearchPublicKgResponse) -> list: """ Extract Entities from Knowledge Graph Response """ diff --git a/search/web-app/genappbuilder_utils.py b/search/web-app/genappbuilder_utils.py index 6ddec4837d0..d43b89ba192 100644 --- a/search/web-app/genappbuilder_utils.py +++ b/search/web-app/genappbuilder_utils.py @@ -14,7 +14,6 @@ """Vertex AI Search Utilities""" from os.path import basename -from typing import Dict, List, Optional, Tuple from google.cloud import discoveryengine_v1alpha as discoveryengine @@ -25,7 +24,7 @@ def list_documents( project_id: str, location: str, datastore_id: str, -) -> List[Dict[str, str]]: +) -> list[dict[str, str]]: client = discoveryengine.DocumentServiceClient() parent = client.branch_path( @@ -48,15 +47,15 @@ def list_documents( def search_enterprise_search( project_id: str, location: str, - data_store_id: Optional[str] = None, - engine_id: Optional[str] = None, + data_store_id: str | None = None, + engine_id: str | None = None, page_size: int = 50, - search_query: Optional[str] = None, - image_bytes: Optional[bytes] = None, - params: Optional[Dict] = None, - summary_model: Optional[str] = None, - summary_preamble: Optional[str] = None, -) -> Tuple[List[Dict[str, str | List]], str, str, str, str]: + search_query: str | None = None, + image_bytes: bytes | None = None, + params: dict | None = None, + summary_model: str | None = None, + summary_preamble: str | None = None, +) -> tuple[list[dict[str, str | list]], str, str, str, str]: if bool(search_query) == bool(image_bytes): raise ValueError("Cannot provide both search_query and image_bytes") @@ -157,14 +156,14 @@ def search_enterprise_search( def get_enterprise_search_results( response: discoveryengine.SearchResponse, -) -> List[Dict[str, str | List]]: +) -> list[dict[str, str | list]]: """ Extract Results from Enterprise Search Response """ ROBOT = "https://www.google.com/images/errors/robot.png" - def get_thumbnail_image(data: Dict) -> str: + def get_thumbnail_image(data: dict) -> str: cse_thumbnail = data.get("pagemap", {}).get("cse_thumbnail") image_link = data.get("image", {}).get("thumbnailLink") @@ -174,7 +173,7 @@ def get_thumbnail_image(data: Dict) -> str: return image_link return ROBOT - def get_formatted_link(data: Dict) -> str: + def get_formatted_link(data: dict) -> str: html_formatted_url = data.get("htmlFormattedUrl") image_context_link = data.get("image", {}).get("contextLink") link = data.get("link") @@ -220,9 +219,9 @@ def recommend_personalize( datastore_id: str, serving_config_id: str, document_id: str, - user_pseudo_id: Optional[str] = "xxxxxxxxxxx", - attribution_token: Optional[str] = None, -) -> Tuple: + user_pseudo_id: str | None = "xxxxxxxxxxx", + attribution_token: str | None = None, +) -> tuple: # Create a client client = discoveryengine.RecommendationServiceClient() @@ -271,7 +270,7 @@ def get_storage_link(uri: str) -> str: def get_personalize_results( response: discoveryengine.RecommendResponse, -) -> List[Dict]: +) -> list[dict]: """ Extract Results from Personalize Response """ From 74ef55b64c33a088e7c75eec75552d52d08dac35 Mon Sep 17 00:00:00 2001 From: nhootan <103317089+nhootan@users.noreply.github.com> Date: Wed, 18 Sep 2024 13:30:57 -0400 Subject: [PATCH 07/10] feat: Adding the initial version of Vertex Prompt Optimizer UI Notebook. (#1099) # Description Adding the first version of Vertex AI Prompt Optimizer UI Notebook. --------- Co-authored-by: hootan Co-authored-by: Owl Bot Co-authored-by: Holt Skinner <13262395+holtskinner@users.noreply.github.com> --- .github/CODEOWNERS | 1 + .github/actions/spelling/allow.txt | 3 + .../vertex_ai_prompt_optimizer_ui.ipynb | 953 ++++++++++++++++++ 3 files changed, 957 insertions(+) create mode 100644 gemini/prompts/prompt_optimizer/vertex_ai_prompt_optimizer_ui.ipynb diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index 0b6764fb268..05e6c60cf79 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -28,6 +28,7 @@ /generative-ai/language/grounding @koverholt @holtskinner @GoogleCloudPlatform/generative-ai-devrel /generative-ai/language/orchestration/langchain @kweinmeister @RajeshThallam @GoogleCloudPlatform/generative-ai-devrel /generative-ai/language/prompts @polong-lin @GoogleCloudPlatform/generative-ai-devrel +/generative-ai/language/prompts/prompt_optimizer @nhootan @inardini @GoogleCloudPlatform/generative-ai-devrel /generative-ai/language/sample-apps @rominirani @GoogleCloudPlatform/generative-ai-devrel /generative-ai/language/translation @holtskinner @GoogleCloudPlatform/generative-ai-devrel /generative-ai/language/tuning @erwinh85 @GoogleCloudPlatform/generative-ai-devrel diff --git a/.github/actions/spelling/allow.txt b/.github/actions/spelling/allow.txt index f9346345d34..05f313ccb60 100644 --- a/.github/actions/spelling/allow.txt +++ b/.github/actions/spelling/allow.txt @@ -229,6 +229,7 @@ Unimicron Upserting Urs Uszkoreit +VAPO VFT VMs VOS @@ -272,6 +273,7 @@ apredict aquery arXiv aretrieve +argmax arun astype autoflake @@ -329,6 +331,7 @@ docstore dpi draig drinkware +dropdown dropna dsl dtypes diff --git a/gemini/prompts/prompt_optimizer/vertex_ai_prompt_optimizer_ui.ipynb b/gemini/prompts/prompt_optimizer/vertex_ai_prompt_optimizer_ui.ipynb new file mode 100644 index 00000000000..aacc919c5bd --- /dev/null +++ b/gemini/prompts/prompt_optimizer/vertex_ai_prompt_optimizer_ui.ipynb @@ -0,0 +1,953 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "hlI1rYKa2IGx" + }, + "outputs": [], + "source": [ + "# Copyright 2024 Google LLC\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pHyuJTFr2IGx" + }, + "source": [ + "# Overview\n", + "Welcome to Vertex AI Prompt Optimizer (VAPO)! This Notebook showcases VAPO, a tool that iteratively optimizes prompts to suit a target model (e.g., `gemini-1.5-pro`) using target-specific metric(s).\n", + "\n", + "Key Use Cases:\n", + "\n", + "* Prompt Optimization: Enhance the quality of an initial prompt by refining its structure and content to match the target model's optimal input characteristics.\n", + "\n", + "* Prompt Translation: Adapt prompts optimized for one model to work effectively with a different target model." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tTtKHedrO1Rx" + }, + "source": [ + "# Step 0: Install packages and libraries" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "8-Zw72vFORz_" + }, + "outputs": [], + "source": [ + "! pip3 install -U google-cloud-aiplatform -q\n", + "\n", + "import datetime\n", + "import os\n", + "import time\n", + "\n", + "from IPython.display import HTML, display\n", + "from google.auth import default\n", + "from google.cloud import aiplatform, storage\n", + "from google.colab import auth, output\n", + "import gspread\n", + "import ipywidgets as widgets\n", + "import jinja2\n", + "from jinja2 import BaseLoader, Environment\n", + "import jinja2.meta\n", + "import pandas as pd\n", + "import tensorflow.io.gfile as gfile\n", + "\n", + "output.enable_custom_widget_manager()\n", + "from io import StringIO\n", + "import json\n", + "import re\n", + "\n", + "\n", + "def authenticate():\n", + " auth.authenticate_user()\n", + " creds, _ = default()\n", + " return gspread.authorize(creds)\n", + "\n", + "\n", + "def is_target_required_metric(eval_metric: str) -> bool:\n", + " return eval_metric in [\n", + " \"bleu\",\n", + " \"exact_match\",\n", + " \"question_answering_correctness\",\n", + " \"rouge_1\",\n", + " \"rouge_2\",\n", + " \"rouge_l\",\n", + " \"rouge_l_sum\",\n", + " \"tool_call_valid\",\n", + " \"tool_name_match\",\n", + " \"tool_parameter_key_match\",\n", + " \"tool_parameter_kv_match\",\n", + " ]\n", + "\n", + "\n", + "def is_run_target_required(eval_metric_types: list[str], source_model: str) -> bool:\n", + " if source_model:\n", + " return False\n", + "\n", + " label_required = False\n", + " for metric in eval_metric_types:\n", + " label_required = label_required or is_target_required_metric(metric)\n", + " return label_required\n", + "\n", + "\n", + "_TARGET_KEY = \"target\"\n", + "\n", + "\n", + "def validate_prompt_and_data(\n", + " template: str,\n", + " dataset_path: str,\n", + " placeholder_to_content: str,\n", + " label_enforced: bool,\n", + ") -> None:\n", + " \"\"\"Validates the prompt template and the dataset.\"\"\"\n", + " placeholder_to_content = json.loads(placeholder_to_content)\n", + " with gfile.GFile(dataset_path, \"r\") as f:\n", + " data = [json.loads(line) for line in f.readlines()]\n", + "\n", + " env = jinja2.Environment()\n", + " try:\n", + " parsed_content = env.parse(template)\n", + " except jinja2.exceptions.TemplateSyntaxError as e:\n", + " raise ValueError(f\"Invalid template: {template}\") from e\n", + "\n", + " template_variables = jinja2.meta.find_undeclared_variables(parsed_content)\n", + " extra_keys = set()\n", + " for ex in data:\n", + " ex.update(placeholder_to_content)\n", + " missing_keys = [key for key in template_variables if key not in ex]\n", + " extra_keys.update([key for key in ex if key not in template_variables])\n", + " if label_enforced:\n", + " if _TARGET_KEY not in ex:\n", + " raise ValueError(\n", + " f\"The example {ex} doesn't have a key corresponding to the target\"\n", + " f\" var: {_TARGET_KEY}\"\n", + " )\n", + " if not ex[_TARGET_KEY]:\n", + " raise ValueError(f\"The following example has an empty target: {ex}\")\n", + " if missing_keys:\n", + " raise ValueError(\n", + " f\"The example {ex} doesn't have a key corresponding to following\"\n", + " f\" template vars: {missing_keys}\"\n", + " )\n", + " if extra_keys:\n", + " raise Warning(\n", + " \"Warning: extra keys in the examples not used in the context/task\"\n", + " f\" template {extra_keys}\"\n", + " )\n", + "\n", + "\n", + "def run_custom_job(\n", + " display_name: str,\n", + " container_uri: str,\n", + " container_args: dict[str, str],\n", + ") -> None:\n", + " \"\"\"A sample to create custom jobs.\"\"\"\n", + " worker_pool_specs = [\n", + " {\n", + " \"replica_count\": 1,\n", + " \"container_spec\": {\n", + " \"image_uri\": container_uri,\n", + " \"args\": [f\"--{k}={v}\" for k, v in container_args.items()],\n", + " },\n", + " \"machine_spec\": {\n", + " \"machine_type\": \"n1-standard-4\",\n", + " },\n", + " }\n", + " ]\n", + "\n", + " custom_job = aiplatform.CustomJob(\n", + " display_name=display_name,\n", + " worker_pool_specs=worker_pool_specs,\n", + " )\n", + " custom_job.submit()\n", + " return custom_job\n", + "\n", + "\n", + "def run_apd(config: dict[str, str], bucket_uri: str, display_name: str) -> None:\n", + " \"\"\"A function to the vertex prompt optimizer.\"\"\"\n", + " print(f\"\\n\\nJob display name: {display_name}\")\n", + " version = \"preview_v1_0\"\n", + " container_uri = \"us-docker.pkg.dev/vertex-ai-restricted/builtin-algorithm/apd\"\n", + " config_path = f\"{bucket_uri}/{display_name}/input_config.json\"\n", + "\n", + " with gfile.GFile(config_path, \"w\") as f:\n", + " json.dump(config, f)\n", + "\n", + " aiplatform.init(\n", + " project=config[\"project\"],\n", + " location=config[\"target_model_location\"],\n", + " staging_bucket=f\"{bucket_uri}/{display_name}\",\n", + " )\n", + "\n", + " return run_custom_job(\n", + " display_name=display_name,\n", + " container_uri=f\"{container_uri}:{version}\",\n", + " container_args={\"config\": config_path},\n", + " )\n", + "\n", + "\n", + "def update_best_display(\n", + " df: pd.DataFrame,\n", + " textarea: widgets.Textarea,\n", + " best_score_label: widgets.Label,\n", + " eval_metric: str,\n", + ") -> None:\n", + " \"\"\"Update the best prompt display.\"\"\"\n", + "\n", + " df[\"score\"] = df[f\"metrics.{eval_metric}/mean\"]\n", + "\n", + " best_template = df.loc[df[\"score\"].argmax(), \"prompt\"]\n", + " best_score = df.loc[df[\"score\"].argmax(), \"score\"]\n", + " original_score = df.loc[0, \"score\"]\n", + "\n", + " def placeholder_llm():\n", + " return \"{{llm()}}\"\n", + "\n", + " env = Environment(loader=BaseLoader())\n", + " env.globals[\"llm\"] = placeholder_llm\n", + "\n", + " best_template = best_template.replace(\"store('answer', llm())\", \"llm()\")\n", + " textarea.value = best_template\n", + " improvement = best_score - original_score\n", + " no_improvement_str = \"\\nNo better template is found yet.\" if not improvement else \"\"\n", + " best_score_label.value = (\n", + " f\"Score: {best_score}\" f\" Improvement: {improvement: .3f} {no_improvement_str}\"\n", + " )\n", + "\n", + "\n", + "def generate_dataframe(filename: str) -> pd.DataFrame:\n", + " \"\"\"Generates a pandas dataframe from a json file.\"\"\"\n", + " if not gfile.exists(filename):\n", + " return pd.DataFrame()\n", + "\n", + " with gfile.GFile(filename, \"r\") as f:\n", + " try:\n", + " data = json.load(f)\n", + " except:\n", + " return pd.DataFrame()\n", + " return pd.json_normalize(data)\n", + "\n", + "\n", + "def left_aligned_df_html(df: pd.DataFrame) -> None:\n", + " \"\"\"Displays a Pandas DataFrame in Colab with left-aligned values.\"\"\"\n", + "\n", + " # Convert to HTML table, but keep the HTML in a variable\n", + " html_table = df.to_html(index=False, classes=\"left-aligned\")\n", + "\n", + " # Add CSS styling to left-align table data cells and override default styles\n", + " styled_html = f\"\"\"\n", + " \n", + " {html_table}\n", + " \"\"\"\n", + "\n", + " # Display the styled HTML table\n", + " return HTML(styled_html)\n", + "\n", + "\n", + "def extract_top_level_function_name(source_code: str) -> str | None:\n", + " match = re.search(r\"^def\\s+([a-zA-Z_]\\w*)\\s*\\(\", source_code, re.MULTILINE)\n", + " if match:\n", + " return match.group(1)\n", + " return None\n", + "\n", + "\n", + "class ProgressForm:\n", + " \"\"\"A class to display the progress of the optimization job.\"\"\"\n", + "\n", + " def __init__(self):\n", + " self.instruction_progress_bar = None\n", + " self.instruction_display = None\n", + " self.instruction_best = None\n", + " self.instruction_score = None\n", + "\n", + " self.demo_progress_bar = None\n", + " self.demo_display = None\n", + " self.demo_best = None\n", + " self.demo_score = None\n", + "\n", + " self.job_state_display = None\n", + "\n", + " self.instruction_df = None\n", + " self.demo_df = None\n", + "\n", + " self.started = False\n", + "\n", + " def init(self, params: dict[str, str]):\n", + " \"\"\"Initialize the progress form.\"\"\"\n", + " self.job_state_display = display(\n", + " HTML(\"Job State: Not Started!\"), display_id=True\n", + " )\n", + " self.status_display = display(HTML(\"\"), display_id=True)\n", + "\n", + " if params[\"optimization_mode\"] in [\"instruction\", \"instruction_and_demo\"]:\n", + " (\n", + " self.instruction_progress_bar,\n", + " self.instruction_display,\n", + " self.instruction_best,\n", + " self.instruction_score,\n", + " ) = self.create_progress_ui(\"Instruction\", params[\"num_steps\"])\n", + "\n", + " if params[\"optimization_mode\"] in [\"demonstration\", \"instruction_and_demo\"]:\n", + " (\n", + " self.demo_progress_bar,\n", + " self.demo_display,\n", + " self.demo_best,\n", + " self.demo_score,\n", + " ) = self.create_progress_ui(\n", + " \"Demonstration\", params[\"num_demo_set_candidates\"]\n", + " )\n", + "\n", + " eval_metric = \"composite_metric\"\n", + " if len(params[\"eval_metrics_types\"]) == 1:\n", + " eval_metric = params[\"eval_metrics_types\"][0]\n", + "\n", + " if eval_metric != \"composite_metric\" and \"custom_metric_source_code\" in params:\n", + " self.eval_metric = extract_top_level_function_name(\n", + " params[\"custom_metric_source_code\"]\n", + " )\n", + " else:\n", + " self.eval_metric = eval_metric\n", + "\n", + " self.output_path = params[\"output_path\"]\n", + " self.started = True\n", + "\n", + " def update_progress(\n", + " self,\n", + " progress_bar: widgets.IntProgress,\n", + " templates_file: str,\n", + " df: pd.DataFrame | None,\n", + " df_display: display,\n", + " best_textarea: widgets.Textarea,\n", + " best_score: widgets.Label,\n", + " eval_metric: str,\n", + " ):\n", + " \"\"\"Update the progress of the optimization job.\"\"\"\n", + "\n", + " def get_last_step(df: pd.DataFrame):\n", + " if df.empty:\n", + " return -1\n", + " return int(df[\"step\"].max())\n", + "\n", + " if progress_bar is None or df is None:\n", + " return pd.DataFrame()\n", + "\n", + " new_df = generate_dataframe(templates_file)\n", + "\n", + " last_step = get_last_step(df)\n", + " new_last_step = get_last_step(new_df)\n", + " if new_last_step > last_step:\n", + " df_display.update(left_aligned_df_html(new_df))\n", + " update_best_display(new_df, best_textarea, best_score, eval_metric)\n", + " progress_bar.value = progress_bar.value + new_last_step - last_step\n", + "\n", + " return new_df\n", + "\n", + " def create_progress_ui(\n", + " self, opt_mode: str, num_opt_steps: int\n", + " ) -> tuple[widgets.IntProgress, display, widgets.Textarea, widgets.Label]:\n", + " \"\"\"Create the progress UI for a specific optimization mode.\"\"\"\n", + " print(f\"\\n\\n{opt_mode} Optimization\")\n", + " progress_bar = widgets.IntProgress(\n", + " value=0, min=0, max=num_opt_steps, step=1, description=\"Progress\"\n", + " )\n", + " display(progress_bar)\n", + " print(\"\\nGenerated Templates:\")\n", + " templates_display = display(\"No template is evaluated yet!\", display_id=True)\n", + "\n", + " print(\"\\nBest Template so far:\")\n", + " best_textarea = widgets.Textarea(\n", + " value=\"NA\",\n", + " disabled=False,\n", + " layout=widgets.Layout(width=\"80%\", height=\"150px\"),\n", + " )\n", + " display(best_textarea)\n", + "\n", + " best_score = widgets.Label(value=\"Score: NA Improvement: NA\")\n", + " display(best_score)\n", + "\n", + " return progress_bar, templates_display, best_textarea, best_score\n", + "\n", + " def monitor_progress(self, job: aiplatform.CustomJob, params: dict[str, str]):\n", + " \"\"\"Monitor the progress of the optimization job.\"\"\"\n", + " if not self.started:\n", + " self.init(params)\n", + "\n", + " self.job_state_display.update(HTML(f\"Job State: {job.state.name}\"))\n", + "\n", + " # Initial display of the dataframe\n", + " instruction_templates_file = f\"{self.output_path}/instruction/templates.json\"\n", + " demo_templates_file = f\"{self.output_path}/demonstration/templates.json\"\n", + "\n", + " if not job.done():\n", + " self.instruction_df = self.update_progress(\n", + " self.instruction_progress_bar,\n", + " instruction_templates_file,\n", + " self.instruction_df,\n", + " self.instruction_display,\n", + " self.instruction_best,\n", + " self.instruction_score,\n", + " self.eval_metric,\n", + " )\n", + " self.demo_df = self.update_progress(\n", + " self.demo_progress_bar,\n", + " demo_templates_file,\n", + " self.demo_df,\n", + " self.demo_display,\n", + " self.demo_best,\n", + " self.demo_score,\n", + " self.eval_metric,\n", + " )\n", + " return True\n", + "\n", + " if job.state.name != \"JOB_STATE_SUCCEEDED\":\n", + " errors = [f\"Error: Job failed with error {job.error}.\"]\n", + " for err_file in [\n", + " f\"{self.output_path}/instruction/error.json\",\n", + " f\"{self.output_path}/demonstration/error.json\",\n", + " ]:\n", + " if gfile.exists(err_file):\n", + " with gfile.GFile(err_file, \"r\") as f:\n", + " error_json = json.load(f)\n", + " errors.append(f\"Detailed error: {error_json}\")\n", + " errors.append(\n", + " f\"Please feel free to send {err_file} to the VAPO team to help\"\n", + " \" resolving the issue.\"\n", + " )\n", + "\n", + " errors.append(\n", + " \"All the templates found before failure can be found under\"\n", + " f\" {self.output_path}\"\n", + " )\n", + " errors.append(\n", + " \"Please consider rerunning to make sure the failure is intransient.\"\n", + " )\n", + " err = \"\\n\".join(errors)\n", + " self.status_display.update(HTML(f'{err}'))\n", + " else:\n", + " self.status_display.update(\n", + " HTML(\n", + " 'Job succeeded! All the'\n", + " f\" artifacts can be found under {self.output_path}\"\n", + " )\n", + " )\n", + " return False\n", + "\n", + "\n", + "def display_dataframe(df: pd.DataFrame) -> None:\n", + " \"\"\"Display a pandas dataframe in Colab.\"\"\"\n", + "\n", + " # Function to wrap text in a scrollable div\n", + " def wrap_in_scrollable_div(text):\n", + " return f'
{text}
'\n", + "\n", + " # Apply the function to every cell using the format method\n", + " styled_html = df.style.format(wrap_in_scrollable_div).to_html(index=False)\n", + "\n", + " # Display the HTML in the notebook\n", + " display(HTML(styled_html))\n", + "\n", + "\n", + "def split_gcs_path(gcs_path: str) -> tuple[str, str]:\n", + " \"\"\"Splits a full GCS path into bucket name and prefix.\"\"\"\n", + " if gcs_path.startswith(\"gs://\"):\n", + " path_without_scheme = gcs_path[5:] # Remove the 'gs://' part\n", + " parts = path_without_scheme.split(\"/\", 1)\n", + " bucket_name = parts[0]\n", + " prefix = parts[1] if len(parts) > 1 else \"\"\n", + " return bucket_name, prefix\n", + " else:\n", + " raise ValueError(\"Invalid GCS path. Must start with 'gs://'\")\n", + "\n", + "\n", + "def list_gcs_objects(full_gcs_path: str) -> list[str]:\n", + " \"\"\"Lists all the objects in the given GCS path.\"\"\"\n", + " bucket_name, prefix = split_gcs_path(full_gcs_path)\n", + " storage_client = storage.Client()\n", + " bucket = storage_client.bucket(bucket_name)\n", + " blobs = bucket.list_blobs(\n", + " prefix=prefix\n", + " ) # List all objects that start with the prefix\n", + "\n", + " return [blob.name for blob in blobs]\n", + "\n", + "\n", + "def find_directories_with_files(\n", + " full_gcs_path: str, required_files: list[str]\n", + ") -> list[str]:\n", + " \"\"\"Finds directories containing specific files under the given full GCS path.\"\"\"\n", + " bucket_name, prefix = split_gcs_path(full_gcs_path)\n", + " all_paths = list_gcs_objects(f\"gs://{bucket_name}/{prefix}\")\n", + " directories = set()\n", + "\n", + " # Create a dictionary to track files found in each directory\n", + " file_presence = {}\n", + " for path in all_paths:\n", + " directory = \"/\".join(path.split(\"/\")[:-1]) # Get the directory part of the path\n", + " filename = path.split(\"/\")[-1] # Get the filename part of the path\n", + " if directory:\n", + " if directory not in file_presence:\n", + " file_presence[directory] = set()\n", + " file_presence[directory].add(filename)\n", + "\n", + " # Check which directories have all required files\n", + " for directory, files in file_presence.items():\n", + " if all(file in files for file in required_files):\n", + " directories.add(f\"gs://{bucket_name}/{directory}\")\n", + "\n", + " return list(directories)\n", + "\n", + "\n", + "def extract_metric_name(metric_string: str):\n", + " # Use a regular expression to find the metric name\n", + " match = re.search(r\"\\.(\\w+)/\", metric_string)\n", + " # Return the matched group if found\n", + " return match.group(1) if match else metric_string\n", + "\n", + "\n", + "def read_file_from_gcs(filename: str):\n", + " with gfile.GFile(filename, \"r\") as f:\n", + " return f.read()\n", + "\n", + "\n", + "def process_results(df: pd.DataFrame) -> pd.DataFrame:\n", + " \"\"\"Process the results removing columns that could be confusing.\"\"\"\n", + " columns_to_drop = []\n", + " # Dropping columns that could be confusing.\n", + " for col in df.columns:\n", + " if \"confidence\" in col:\n", + " columns_to_drop.append(col)\n", + " if \"raw_eval_resp\" in col:\n", + " columns_to_drop.append(col)\n", + " if col == \"instruction\":\n", + " columns_to_drop.append(col)\n", + " if col == \"context\":\n", + " columns_to_drop.append(col)\n", + " return df.drop(columns=columns_to_drop)\n", + "\n", + "\n", + "class ResultsUI:\n", + " \"\"\"A UI to display the results of a VAPO run.\"\"\"\n", + "\n", + " def __init__(self, path: str):\n", + " required_files = [\"eval_results.json\", \"templates.json\"]\n", + " runs = find_directories_with_files(path, required_files)\n", + "\n", + " self.run_label = widgets.Label(\"Select Run:\")\n", + " self.run_dropdrown = widgets.Dropdown(\n", + " options=runs, value=runs[0], layout=widgets.Layout(width=\"200px\")\n", + " )\n", + " self.run_dropdrown.observe(self.display_run_handler, names=\"value\")\n", + "\n", + " # Create a label widget for the description\n", + " self.dropdown_description = widgets.Label(\"Select Template:\")\n", + " self.template_dropdown = widgets.Dropdown(\n", + " options=[],\n", + " value=None,\n", + " layout=widgets.Layout(width=\"400px\"),\n", + " disabled=True,\n", + " )\n", + " self.template_dropdown.observe(self.display_template_handler, names=\"value\")\n", + " self.results_output = widgets.Output(\n", + " layout=widgets.Layout(\n", + " height=\"600px\", overflow=\"auto\", margin=\"20px 0px 0px 0px\"\n", + " )\n", + " )\n", + " self.display_run(runs[0])\n", + "\n", + " def display_template_handler(self, change: dict[str, str]) -> None:\n", + " \"\"\"Display the template and the corresponding evaluation results.\"\"\"\n", + " if change[\"new\"] is None:\n", + " return\n", + " df_index = int(change[\"new\"].split(\" \")[1])\n", + " self.display_eval_results(df_index)\n", + "\n", + " def display_run_handler(self, change) -> None:\n", + " if change[\"new\"] is None:\n", + " return\n", + "\n", + " path = change[\"new\"]\n", + " self.display_run(path)\n", + "\n", + " def display_run(self, path: str) -> None:\n", + " \"\"\"Display the results of a VAPO run.\"\"\"\n", + " self.run_dropdrown.disabled = True\n", + " filename = f\"{path}/eval_results.json\"\n", + " eval_results = json.loads(read_file_from_gcs(filename))\n", + "\n", + " filename = f\"{path}/templates.json\"\n", + " templates = json.loads(read_file_from_gcs(filename))\n", + "\n", + " if len(templates) == len(eval_results):\n", + " offset = 0\n", + " elif len(templates) == len(eval_results) + 1:\n", + " # In some setups it is possible to have 1 more template than results.\n", + " offset = 1\n", + " else:\n", + " raise ValueError(\n", + " \"Number of templates doesn't match number of eval results\"\n", + " f\" {len(templates)} vs {len(eval_results)}\"\n", + " )\n", + " self.templates = [\n", + " pd.json_normalize(template) for template in templates[offset:]\n", + " ]\n", + " metric_columns = [col for col in self.templates[0].columns if \"metric\" in col]\n", + "\n", + " self.eval_results = [\n", + " process_results(pd.read_json(StringIO(result[\"metrics_table\"])))\n", + " for result in eval_results\n", + " ]\n", + " options = []\n", + " for i, template in enumerate(self.templates):\n", + " metrics = []\n", + " for col in metric_columns:\n", + " value = template[col].tolist()[0]\n", + " short_col = extract_metric_name(col)\n", + " metrics.append(f\"{short_col}: {value}\")\n", + " metrics_str = \" \".join(metrics)\n", + " options.append(f\"Template {i} {metrics_str}\")\n", + "\n", + " self.template_dropdown.disabled = False\n", + " self.template_dropdown.options = options\n", + " self.run_dropdrown.disabled = False\n", + "\n", + " def display_eval_results(self, index: int) -> None:\n", + " \"\"\"Display the evaluation results for a specific template.\"\"\"\n", + " with self.results_output:\n", + " self.results_output.clear_output(wait=True) # Clear previous output\n", + " display_dataframe(self.templates[index])\n", + " print()\n", + " display_dataframe(self.eval_results[index])\n", + "\n", + " def get_container(self) -> widgets.Output:\n", + " \"\"\"Get the container widget for the results UI.\"\"\"\n", + " return widgets.VBox(\n", + " [\n", + " self.run_label,\n", + " self.run_dropdrown,\n", + " self.dropdown_description,\n", + " self.template_dropdown,\n", + " self.results_output,\n", + " ]\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-p59jd5rOp4q" + }, + "source": [ + "# Step 1: Configure your prompt template\n", + "Prompts consist of two key parts:\n", + "* System Instruction (SI) Template: A fixed instruction shared across all queries for a given task.\n", + "* Task/Context Template: A dynamic part that changes based on the task.\n", + "\n", + "APD enables the translation and optimization of the System Instruction Template, while the Task/Context Template remains essential for evaluating different SI templates." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "rJG1pVZO317x" + }, + "outputs": [], + "source": [ + "SYSTEM_INSTRUCTION = \"Answer the following question. Let's think step by step.\\n\" # @param {type:\"string\"}\n", + "PROMPT_TEMPLATE = (\n", + " \"Question: {{question}}\\n\\nAnswer:{{target}}\" # @param {type:\"string\"}\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5y-cmg0TQP6v" + }, + "source": [ + "# Step 2: Input your data\n", + "To optimize the model, provide a CSV or JSONL file containing labeled validation samples\n", + "* Focus on examples that specifically demonstrate the issues you want to address.\n", + "* Recommendation: Use 50-100 distinct samples for reliable results. However, the tool can still be effective with as few as 5 samples.\n", + "\n", + "For prompt translation:\n", + "* Consider using the source model to label examples that the target model struggles with, helping to identify areas for improvement.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "mfgi_oR6tTIB" + }, + "outputs": [], + "source": [ + "# @markdown **Project setup**:
\n", + "PROJECT_ID = \"[YOUR_PROJECT]\" # @param {type:\"string\"}\n", + "LOCATION = \"us-central1\" # @param {type:\"string\"}\n", + "OUTPUT_PATH = \"[OUTPUT_PATH]\" # @param {type:\"string\"}\n", + "# @markdown * GCS path of your bucket, e.g., gs://prompt_translation_demo, used to store all artifacts.\n", + "INPUT_DATA_PATH = \"[INPUT_DATA_PATH]\" # @param {type:\"string\"}\n", + "# @markdown * Specify a GCS path for the input data, e.g., gs://prompt_translation_demo/input_data.jsonl." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ucebZHkHRxKH" + }, + "source": [ + "# Step 3: Configure optimization settings\n", + "The optimization configs are defaulted to the values that are most commonly used and which we recommend using initially." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "B2R3P8mMvK9q" + }, + "outputs": [], + "source": [ + "TARGET_MODEL = \"gemini-1.5-flash-001\" # @param [\"gemini-1.0-pro-001\", \"gemini-1.0-pro-002\", \"gemini-1.5-flash-001\", \"gemini-1.5-pro-001\", \"gemini-1.0-ultra-001\"]\n", + "SOURCE_MODEL = \"\" # @param [\"\", \"gemini-1.0-pro-001\", \"gemini-1.0-pro-002\", \"gemini-1.5-flash-001\", \"gemini-1.5-pro-001\", \"gemini-1.0-ultra-001\", \"text-bison@001\", \"text-bison@002\", \"text-bison32k@002\", \"text-unicorn@001\"]\n", + "# @markdown * If set, it will be used to generate ground truth responses for the input examples. This is useful to migrate the prompt from a source model.\n", + "OPTIMIZATION_MODE = \"instruction_and_demo\" # @param [\"instruction\", \"demonstration\", \"instruction_and_demo\"]\n", + "OPTIMIZATION_METRIC = \"question_answering_correctness\" # @param [\"bleu\", \"coherence\", \"exact_match\", \"fluency\", \"groundedness\", \"text_quality\", \"verbosity\", \"rouge_1\", \"rouge_2\", \"rouge_l\", \"rouge_l_sum\", \"safety\", \"question_answering_correctness\", \"question_answering_quality\", \"summarization_quality\", \"tool_name_match\", \"tool_parameter_key_match\", \"tool_parameter_kv_match\", \"tool_call_valid\"] {type:\"string\"}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "kO7fO0qTSNLs" + }, + "source": [ + "# Step 4: Configure advanced optimization settings [Optional]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "fRHHTpaV4Xyo" + }, + "outputs": [], + "source": [ + "# @markdown **Instruction Optimization Configs**:
\n", + "NUM_INST_OPTIMIZATION_STEPS = 10 # @param {type:\"integer\"}\n", + "NUM_TEMPLATES_PER_STEP = 2 # @param {type:\"integer\"}\n", + "# @markdown * Number of prompt templates generated and evaluated at each optimization step.\n", + "\n", + "# @markdown **Demonstration Optimization Configs**:
\n", + "NUM_DEMO_OPTIMIZATION_STEPS = 10 # @param {type:\"integer\"}\n", + "NUM_DEMO_PER_PROMPT = 3 # @param {type:\"integer\"}\n", + "# @markdown * Number of the demonstrations to include in each prompt.\n", + "\n", + "# @markdown **Model Configs**:
\n", + "TARGET_MODEL_QPS = 3 # @param {type:\"integer\"}\n", + "SOURCE_MODEL_QPS = 3 # @param {type:\"integer\"}\n", + "OPTIMIZER_MODEL = \"gemini-1.5-flash-001\" # @param [\"gemini-1.0-pro-001\", \"gemini-1.0-pro-002\", \"gemini-1.5-flash-001\", \"gemini-1.5-pro-001\", \"gemini-1.0-ultra-001\", \"text-bison@001\", \"text-bison@002\", \"text-bison32k@002\", \"text-unicorn@001\"]\n", + "# @markdown * The model used to generated alternative prompts in the instruction optimization mode.\n", + "OPTIMIZER_MODEL_QPS = 3 # @param {type:\"integer\"}\n", + "EVAL_MODEL_QPS = 3 # @param {type:\"integer\"}\n", + "# @markdown * The QPS for calling the eval model, which is currently gemini-1.5-pro-001.\n", + "\n", + "# @markdown **Multi-metric Configs**:
\n", + "# @markdown Use this section only if you need more than one metric for optimization. This will override the metric you picked above.\n", + "OPTIMIZATION_METRIC_1 = \"NA\" # @param [\"NA\", \"bleu\", \"coherence\", \"exact_match\", \"fluency\", \"groundedness\", \"text_quality\", \"verbosity\", \"rouge_1\", \"rouge_2\", \"rouge_l\", \"rouge_l_sum\", \"safety\", \"question_answering_correctness\", \"question_answering_quality\", \"summarization_quality\", \"tool_name_match\", \"tool_parameter_key_match\", \"tool_parameter_kv_match\", \"tool_call_valid\"] {type:\"string\"}\n", + "OPTIMIZATION_METRIC_1_WEIGHT = 0.0 # @param {type:\"number\"}\n", + "OPTIMIZATION_METRIC_2 = \"NA\" # @param [\"NA\", \"bleu\", \"coherence\", \"exact_match\", \"fluency\", \"groundedness\", \"text_quality\", \"verbosity\", \"rouge_1\", \"rouge_2\", \"rouge_l\", \"rouge_l_sum\", \"safety\", \"question_answering_correctness\", \"question_answering_quality\", \"summarization_quality\", \"tool_name_match\", \"tool_parameter_key_match\", \"tool_parameter_kv_match\", \"tool_call_valid\"] {type:\"string\"}\n", + "OPTIMIZATION_METRIC_2_WEIGHT = 0.0 # @param {type:\"number\"}\n", + "OPTIMIZATION_METRIC_3 = \"NA\" # @param [\"NA\", \"bleu\", \"coherence\", \"exact_match\", \"fluency\", \"groundedness\", \"text_quality\", \"verbosity\", \"rouge_1\", \"rouge_2\", \"rouge_l\", \"rouge_l_sum\", \"safety\", \"question_answering_correctness\", \"question_answering_quality\", \"summarization_quality\", \"tool_name_match\", \"tool_parameter_key_match\", \"tool_parameter_kv_match\", \"tool_call_valid\"] {type:\"string\"}\n", + "OPTIMIZATION_METRIC_3_WEIGHT = 0.0 # @param {type:\"number\"}\n", + "METRIC_AGGREGATION_TYPE = \"weighted_sum\" # @param [\"weighted_sum\", \"weighted_average\"]\n", + "\n", + "# @markdown **Misc Configs**:
\n", + "PLACEHOLDER_TO_VALUE = \"{}\" # @param\n", + "# @markdown * This variable is used for long prompt optimization to not optimize parts of prompt identified by placeholders. It provides a mapping from the placeholder variables to their content. See link for details.\n", + "RESPONSE_MIME_TYPE = \"application/json\" # @param [\"text/plain\", \"application/json\"]\n", + "# @markdown * This variable determines the format of the output for the target model. See link for details.\n", + "TARGET_LANGUAGE = \"English\" # @param [\"English\", \"French\", \"German\", \"Hebrew\", \"Hindi\", \"Japanese\", \"Korean\", \"Portuguese\", \"Simplified Chinese\", \"Spanish\", \"Traditional Chinese\"]\n", + "# @markdown * The language of the system instruction." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "X7Mgb0EHSSFk" + }, + "source": [ + "# Step 5: Run Prompt Optimizer" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Z8NvNLTfxPTf" + }, + "outputs": [], + "source": [ + "timestamp = datetime.datetime.now().strftime(\"%Y-%m-%dT%H:%M:%S\")\n", + "display_name = f\"pt_{timestamp}\"\n", + "\n", + "in_colab_enterprise = \"GOOGLE_CLOUD_PROJECT\" in os.environ\n", + "if not in_colab_enterprise:\n", + " gc = authenticate()\n", + "\n", + "label_enforced = is_run_target_required(\n", + " [\n", + " OPTIMIZATION_METRIC,\n", + " OPTIMIZATION_METRIC_1,\n", + " OPTIMIZATION_METRIC_2,\n", + " OPTIMIZATION_METRIC_3,\n", + " ],\n", + " SOURCE_MODEL,\n", + ")\n", + "input_data_path = f\"{INPUT_DATA_PATH}\"\n", + "validate_prompt_and_data(\n", + " \"\\n\".join([SYSTEM_INSTRUCTION, PROMPT_TEMPLATE]),\n", + " input_data_path,\n", + " PLACEHOLDER_TO_VALUE,\n", + " label_enforced,\n", + ")\n", + "\n", + "output_path = f\"{OUTPUT_PATH}/{display_name}\"\n", + "\n", + "params = {\n", + " \"project\": PROJECT_ID,\n", + " \"num_steps\": NUM_INST_OPTIMIZATION_STEPS,\n", + " \"prompt_template\": SYSTEM_INSTRUCTION,\n", + " \"demo_and_query_template\": PROMPT_TEMPLATE,\n", + " \"target_model\": TARGET_MODEL,\n", + " \"target_model_qps\": TARGET_MODEL_QPS,\n", + " \"target_model_location\": LOCATION,\n", + " \"source_model\": SOURCE_MODEL,\n", + " \"source_model_qps\": SOURCE_MODEL_QPS,\n", + " \"source_model_location\": LOCATION,\n", + " \"eval_model_qps\": EVAL_MODEL_QPS,\n", + " \"eval_model_location\": LOCATION,\n", + " \"optimization_mode\": OPTIMIZATION_MODE,\n", + " \"num_demo_set_candidates\": NUM_DEMO_OPTIMIZATION_STEPS,\n", + " \"demo_set_size\": NUM_DEMO_PER_PROMPT,\n", + " \"aggregation_type\": METRIC_AGGREGATION_TYPE,\n", + " \"data_limit\": 50,\n", + " \"optimizer_model\": OPTIMIZER_MODEL,\n", + " \"optimizer_model_qps\": OPTIMIZER_MODEL_QPS,\n", + " \"optimizer_model_location\": LOCATION,\n", + " \"num_template_eval_per_step\": NUM_TEMPLATES_PER_STEP,\n", + " \"input_data_path\": input_data_path,\n", + " \"output_path\": output_path,\n", + " \"response_mime_type\": RESPONSE_MIME_TYPE,\n", + " \"language\": TARGET_LANGUAGE,\n", + " \"placeholder_to_content\": json.loads(PLACEHOLDER_TO_VALUE),\n", + "}\n", + "\n", + "if OPTIMIZATION_METRIC_1 == \"NA\":\n", + " params[\"eval_metrics_types\"] = [OPTIMIZATION_METRIC]\n", + " params[\"eval_metrics_weights\"] = [1.0]\n", + "else:\n", + " metrics = []\n", + " weights = []\n", + " for metric in [OPTIMIZATION_METRIC_1, OPTIMIZATION_METRIC_2, OPTIMIZATION_METRIC_3]:\n", + " if metric == \"NA\":\n", + " break\n", + " metrics.append(metric)\n", + " weights.append(OPTIMIZATION_METRIC_1_WEIGHT)\n", + " params[\"eval_metrics_types\"] = metrics\n", + " params[\"eval_metrics_weights\"] = weights\n", + "\n", + "job = run_apd(params, OUTPUT_PATH, display_name)\n", + "print(f\"Job ID: {job.name}\")\n", + "\n", + "progress_form = ProgressForm()\n", + "while progress_form.monitor_progress(job, params):\n", + " time.sleep(5)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lo5mcTzwSgBP" + }, + "source": [ + "# Step 6: Inspect the Results\n", + "You can use the following cell to inspect all the predictions made by all the\n", + "generated templates during one or multiple VAPO runs." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "1x6HSty759jY" + }, + "outputs": [], + "source": [ + "RESULT_PATH = \"[GCS_PATH]\" # @param {type:\"string\"}\n", + "# @markdown * Specify a GCS path that contains artifacts of a single or multiple VAPO runs.\n", + "\n", + "results_ui = ResultsUI(RESULT_PATH)\n", + "\n", + "results_df_html = \"\"\"\n", + "\n", + "\"\"\"\n", + "\n", + "display(HTML(results_df_html))\n", + "display(results_ui.get_container())" + ] + } + ], + "metadata": { + "colab": { + "name": "vertex_ai_prompt_optimizer_ui.ipynb", + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} From f0eef416916afe2f1d478c187bfafff580494727 Mon Sep 17 00:00:00 2001 From: nhootan <103317089+nhootan@users.noreply.github.com> Date: Wed, 18 Sep 2024 14:31:44 -0400 Subject: [PATCH 08/10] fix: Adding the links table at the top of the VAPO notebook. (#1134) # Description Adding the links table to the top of the VAPO notebook. --------- Co-authored-by: hootan Co-authored-by: Owl Bot Co-authored-by: Holt Skinner --- .../vertex_ai_prompt_optimizer_ui.ipynb | 30 +++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/gemini/prompts/prompt_optimizer/vertex_ai_prompt_optimizer_ui.ipynb b/gemini/prompts/prompt_optimizer/vertex_ai_prompt_optimizer_ui.ipynb index aacc919c5bd..ee1005f290c 100644 --- a/gemini/prompts/prompt_optimizer/vertex_ai_prompt_optimizer_ui.ipynb +++ b/gemini/prompts/prompt_optimizer/vertex_ai_prompt_optimizer_ui.ipynb @@ -23,6 +23,36 @@ "# limitations under the License." ] }, + { + "cell_type": "markdown", + "metadata": { + "id": "RN8N3O43QDT5" + }, + "source": [ + "\n", + " \n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \"Google
Open in Colab\n", + "
\n", + "
\n", + " \n", + " \"Google
Open in Colab Enterprise\n", + "
\n", + "
\n", + " \n", + " \"Vertex
Open in Vertex AI Workbench\n", + "
\n", + "
\n", + " \n", + " \"GitHub
View on GitHub\n", + "
\n", + "
" + ] + }, { "cell_type": "markdown", "metadata": { From 9de33214c15bc7b6a4d084786bd57af90b12f0ad Mon Sep 17 00:00:00 2001 From: Kristopher Overholt Date: Thu, 19 Sep 2024 11:02:48 -0500 Subject: [PATCH 09/10] feat: Improve error catching in the SQL Talk app (Gemini Function Calling) (#1136) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit # Description This PR adds improved error catching to the SQL Talk app. Currently, if an error is encountered due to a malformed generated SQL query or an error executing the tools / functions, then a full stack trace will appear in the app. With this PR, errors are caught at the SQL execution level and top-level application and rendered in the app (and persisted in the message history) without a full stack trace. Special thanks to @mona19 for suggesting this change and providing a sample implementation! 🙏 How friendlier errors will appear now instead of a full stack trace: --- ![Screenshot 2024-09-18 at 6 21 37 PM](https://github.com/user-attachments/assets/44c45e96-095e-4373-ab23-a6fde871b6e0) --- ![Screenshot 2024-09-18 at 6 22 11 PM](https://github.com/user-attachments/assets/b47cba62-6ee2-4ddf-8832-9f4e4643d0df) --------- Co-authored-by: Owl Bot --- gemini/function-calling/sql-talk-app/app.py | 260 +++++++++++--------- 1 file changed, 148 insertions(+), 112 deletions(-) diff --git a/gemini/function-calling/sql-talk-app/app.py b/gemini/function-calling/sql-talk-app/app.py index c7ecda501b0..4661a27b0ea 100644 --- a/gemini/function-calling/sql-talk-app/app.py +++ b/gemini/function-calling/sql-talk-app/app.py @@ -140,131 +140,167 @@ from BigQuery, do not make up information. """ - response = chat.send_message(prompt) - response = response.candidates[0].content.parts[0] - - print(response) + try: + response = chat.send_message(prompt) + response = response.candidates[0].content.parts[0] - api_requests_and_responses = [] - backend_details = "" + print(response) - function_calling_in_process = True - while function_calling_in_process: - try: - params = {} - for key, value in response.function_call.args.items(): - params[key] = value + api_requests_and_responses = [] + backend_details = "" - print(response.function_call.name) - print(params) + function_calling_in_process = True + while function_calling_in_process: + try: + params = {} + for key, value in response.function_call.args.items(): + params[key] = value - if response.function_call.name == "list_datasets": - api_response = client.list_datasets() - api_response = BIGQUERY_DATASET_ID - api_requests_and_responses.append( - [response.function_call.name, params, api_response] - ) - - if response.function_call.name == "list_tables": - api_response = client.list_tables(params["dataset_id"]) - api_response = str([table.table_id for table in api_response]) - api_requests_and_responses.append( - [response.function_call.name, params, api_response] - ) + print(response.function_call.name) + print(params) - if response.function_call.name == "get_table": - api_response = client.get_table(params["table_id"]) - api_response = api_response.to_api_repr() - api_requests_and_responses.append( - [ - response.function_call.name, - params, - [ - str(api_response.get("description", "")), - str( - [ - column["name"] - for column in api_response["schema"]["fields"] - ] - ), - ], - ] - ) - api_response = str(api_response) - - if response.function_call.name == "sql_query": - job_config = bigquery.QueryJobConfig( - maximum_bytes_billed=100000000 - ) # Data limit per query job - try: - cleaned_query = ( - params["query"] - .replace("\\n", " ") - .replace("\n", "") - .replace("\\", "") - ) - query_job = client.query(cleaned_query, job_config=job_config) - api_response = query_job.result() - api_response = str([dict(row) for row in api_response]) - api_response = api_response.replace("\\", "").replace("\n", "") + if response.function_call.name == "list_datasets": + api_response = client.list_datasets() + api_response = BIGQUERY_DATASET_ID api_requests_and_responses.append( [response.function_call.name, params, api_response] ) - except Exception as e: - api_response = f"{str(e)}" + + if response.function_call.name == "list_tables": + api_response = client.list_tables(params["dataset_id"]) + api_response = str([table.table_id for table in api_response]) api_requests_and_responses.append( [response.function_call.name, params, api_response] ) - print(api_response) - - response = chat.send_message( - Part.from_function_response( - name=response.function_call.name, - response={ - "content": api_response, - }, - ), - ) - response = response.candidates[0].content.parts[0] - - backend_details += "- Function call:\n" - backend_details += ( - " - Function name: ```" - + str(api_requests_and_responses[-1][0]) - + "```" - ) - backend_details += "\n\n" - backend_details += ( - " - Function parameters: ```" - + str(api_requests_and_responses[-1][1]) - + "```" - ) - backend_details += "\n\n" - backend_details += ( - " - API response: ```" - + str(api_requests_and_responses[-1][2]) - + "```" - ) - backend_details += "\n\n" - with message_placeholder.container(): - st.markdown(backend_details) + if response.function_call.name == "get_table": + api_response = client.get_table(params["table_id"]) + api_response = api_response.to_api_repr() + api_requests_and_responses.append( + [ + response.function_call.name, + params, + [ + str(api_response.get("description", "")), + str( + [ + column["name"] + for column in api_response["schema"][ + "fields" + ] + ] + ), + ], + ] + ) + api_response = str(api_response) + + if response.function_call.name == "sql_query": + job_config = bigquery.QueryJobConfig( + maximum_bytes_billed=100000000 + ) # Data limit per query job + try: + cleaned_query = ( + params["query"] + .replace("\\n", " ") + .replace("\n", "") + .replace("\\", "") + ) + query_job = client.query( + cleaned_query, job_config=job_config + ) + api_response = query_job.result() + api_response = str([dict(row) for row in api_response]) + api_response = api_response.replace("\\", "").replace( + "\n", "" + ) + api_requests_and_responses.append( + [response.function_call.name, params, api_response] + ) + except Exception as e: + error_message = f""" + We're having trouble running this SQL query. This + could be due to an invalid query or the structure of + the data. Try rephrasing your question to help the + model generate a valid query. Details: + + {str(e)}""" + st.error(error_message) + api_response = error_message + api_requests_and_responses.append( + [response.function_call.name, params, api_response] + ) + st.session_state.messages.append( + { + "role": "assistant", + "content": error_message, + } + ) + + print(api_response) + + response = chat.send_message( + Part.from_function_response( + name=response.function_call.name, + response={ + "content": api_response, + }, + ), + ) + response = response.candidates[0].content.parts[0] + + backend_details += "- Function call:\n" + backend_details += ( + " - Function name: ```" + + str(api_requests_and_responses[-1][0]) + + "```" + ) + backend_details += "\n\n" + backend_details += ( + " - Function parameters: ```" + + str(api_requests_and_responses[-1][1]) + + "```" + ) + backend_details += "\n\n" + backend_details += ( + " - API response: ```" + + str(api_requests_and_responses[-1][2]) + + "```" + ) + backend_details += "\n\n" + with message_placeholder.container(): + st.markdown(backend_details) - except AttributeError: - function_calling_in_process = False + except AttributeError: + function_calling_in_process = False - time.sleep(3) + time.sleep(3) - full_response = response.text - with message_placeholder.container(): - st.markdown(full_response.replace("$", r"\$")) # noqa: W605 - with st.expander("Function calls, parameters, and responses:"): - st.markdown(backend_details) + full_response = response.text + with message_placeholder.container(): + st.markdown(full_response.replace("$", r"\$")) # noqa: W605 + with st.expander("Function calls, parameters, and responses:"): + st.markdown(backend_details) - st.session_state.messages.append( - { - "role": "assistant", - "content": full_response, - "backend_details": backend_details, - } - ) + st.session_state.messages.append( + { + "role": "assistant", + "content": full_response, + "backend_details": backend_details, + } + ) + except Exception as e: + print(e) + error_message = f""" + Something went wrong! We encountered an unexpected error while + trying to process your request. Please try rephrasing your + question. Details: + + {str(e)}""" + st.error(error_message) + st.session_state.messages.append( + { + "role": "assistant", + "content": error_message, + } + ) From f67f1afd296f8116e0b3de459a9e324b3b7c9965 Mon Sep 17 00:00:00 2001 From: Mend Renovate Date: Thu, 19 Sep 2024 20:15:39 +0200 Subject: [PATCH 10/10] chore(deps): update dependency faker to v29 (#1140) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This PR contains the following updates: | Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | [faker](https://redirect.github.com/joke2k/faker) ([changelog](https://redirect.github.com/joke2k/faker/blob/master/CHANGELOG.md)) | `26.0.0` -> `29.0.0` | [![age](https://developer.mend.io/api/mc/badges/age/pypi/faker/29.0.0?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://developer.mend.io/api/mc/badges/adoption/pypi/faker/29.0.0?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://developer.mend.io/api/mc/badges/compatibility/pypi/faker/26.0.0/29.0.0?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://developer.mend.io/api/mc/badges/confidence/pypi/faker/26.0.0/29.0.0?slim=true)](https://docs.renovatebot.com/merge-confidence/) | --- > [!WARNING] > Some dependencies could not be looked up. Check the warning logs for more information. --- ### Release Notes
joke2k/faker (faker) ### [`v29.0.0`](https://redirect.github.com/joke2k/faker/blob/HEAD/CHANGELOG.md#v2900---2024-09-19) [Compare Source](https://redirect.github.com/joke2k/faker/compare/v28.4.1...v29.0.0) - Fix `pydecimal` distribution when called with a range across `0`. Thanks [@​AlexLitvino](https://redirect.github.com/AlexLitvino). ### [`v28.4.1`](https://redirect.github.com/joke2k/faker/blob/HEAD/CHANGELOG.md#v2841---2024-09-04) [Compare Source](https://redirect.github.com/joke2k/faker/compare/v28.4.0...v28.4.1) - Fix issue where Faker does not properly convert min/max float values to `Decimal`. Thanks [@​bdjellabaldebaran](https://redirect.github.com/bdjellabaldebaran). ### [`v28.4.0`](https://redirect.github.com/joke2k/faker/blob/HEAD/CHANGELOG.md#v2840---2024-09-04) [Compare Source](https://redirect.github.com/joke2k/faker/compare/v28.3.0...v28.4.0) - Add `it_IT` lorem provider. Thanks [@​gianni-di-noia](https://redirect.github.com/gianni-di-noia). ### [`v28.3.0`](https://redirect.github.com/joke2k/faker/blob/HEAD/CHANGELOG.md#v2830---2024-09-04) [Compare Source](https://redirect.github.com/joke2k/faker/compare/v28.2.0...v28.3.0) - Fix male forms of female surnames in `uk_UA`.Thanks [@​AlexLitvino](https://redirect.github.com/AlexLitvino). ### [`v28.2.0`](https://redirect.github.com/joke2k/faker/blob/HEAD/CHANGELOG.md#v2820---2024-09-04) [Compare Source](https://redirect.github.com/joke2k/faker/compare/v28.1.0...v28.2.0) - Add `es_ES` isbn provider. Thanks [@​mondeja](https://redirect.github.com/mondeja). ### [`v28.1.0`](https://redirect.github.com/joke2k/faker/blob/HEAD/CHANGELOG.md#v2810---2024-08-30) [Compare Source](https://redirect.github.com/joke2k/faker/compare/v28.0.0...v28.1.0) - Fix Incorrect City Spelling in `uk_UA` locale. Thanks [@​ch4zzy](https://redirect.github.com/ch4zzy). ### [`v28.0.0`](https://redirect.github.com/joke2k/faker/blob/HEAD/CHANGELOG.md#v2800---2024-08-23) [Compare Source](https://redirect.github.com/joke2k/faker/compare/v27.4.0...v28.0.0) - Fix `pydecimal` handling of `positive` keyword. Thanks [@​tahzeer](https://redirect.github.com/tahzeer). ### [`v27.4.0`](https://redirect.github.com/joke2k/faker/blob/HEAD/CHANGELOG.md#v2740---2024-08-21) [Compare Source](https://redirect.github.com/joke2k/faker/compare/v27.3.0...v27.4.0) - Add person provider for `pk_PK` locale. Thanks [@​c2-tlhah](https://redirect.github.com/c2-tlhah) ### [`v27.3.0`](https://redirect.github.com/joke2k/faker/blob/HEAD/CHANGELOG.md#v2730---2024-08-21) [Compare Source](https://redirect.github.com/joke2k/faker/compare/v27.2.0...v27.3.0) - Add providers for `vi_VN` locale. Thanks [@​ntd1683](https://redirect.github.com/ntd1683). ### [`v27.2.0`](https://redirect.github.com/joke2k/faker/blob/HEAD/CHANGELOG.md#v2720---2024-08-21) [Compare Source](https://redirect.github.com/joke2k/faker/compare/v27.1.0...v27.2.0) - Split names in `en_IN` person provider. Thanks [@​wh0th3h3llam1](https://redirect.github.com/wh0th3h3llam1). ### [`v27.1.0`](https://redirect.github.com/joke2k/faker/blob/HEAD/CHANGELOG.md#v2710---2024-08-21) [Compare Source](https://redirect.github.com/joke2k/faker/compare/v27.0.0...v27.1.0) - Add address providoer for `en_MS` local. Thanks [@​carlosfunk](https://redirect.github.com/carlosfunk). ### [`v27.0.0`](https://redirect.github.com/joke2k/faker/blob/HEAD/CHANGELOG.md#v2700---2024-08-12) [Compare Source](https://redirect.github.com/joke2k/faker/compare/v26.3.0...v27.0.0) - Re-introduce `part_of_speech` argument to `words()` method. ### [`v26.3.0`](https://redirect.github.com/joke2k/faker/blob/HEAD/CHANGELOG.md#v2630---2024-08-08) [Compare Source](https://redirect.github.com/joke2k/faker/compare/v26.2.0...v26.3.0) - Extend `ro_RO` company localization with prefixes. Thanks [@​DDSNA](https://redirect.github.com/DDSNA). ### [`v26.2.0`](https://redirect.github.com/joke2k/faker/blob/HEAD/CHANGELOG.md#v2620---2024-08-06) [Compare Source](https://redirect.github.com/joke2k/faker/compare/v26.1.0...v26.2.0) - Add Swahili (`sw`) provider for generating Swahili names. Thanks [@​5uru](https://redirect.github.com/5uru). ### [`v26.1.0`](https://redirect.github.com/joke2k/faker/blob/HEAD/CHANGELOG.md#v2610---2024-08-01) [Compare Source](https://redirect.github.com/joke2k/faker/compare/v26.0.0...v26.1.0) - Add more entries to `sk_SK` Geo provider. Thanks [@​george0st](https://redirect.github.com/george0st).
--- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] If you want to rebase/retry this PR, check this box --- This PR was generated by [Mend Renovate](https://mend.io/renovate/). View the [repository job log](https://developer.mend.io/github/GoogleCloudPlatform/generative-ai). --- gemini/sample-apps/llamaindex-rag/pyproject.toml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gemini/sample-apps/llamaindex-rag/pyproject.toml b/gemini/sample-apps/llamaindex-rag/pyproject.toml index 317ee5d197f..3869011ab7d 100644 --- a/gemini/sample-apps/llamaindex-rag/pyproject.toml +++ b/gemini/sample-apps/llamaindex-rag/pyproject.toml @@ -59,7 +59,7 @@ dulwich = "0.21.7" email-validator = "2.2.0" entrypoints = "0.4" exceptiongroup = "1.2.2" -faker = "26.0.0" +faker = "29.0.0" fastapi = "0.111.1" fastapi-cli = "0.0.4" fastjsonschema = "2.20.0"