-
Notifications
You must be signed in to change notification settings - Fork 824
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: add vertexai embeddings (#2693)
This PR: - Adds VertexAI embeddings as an embedding provider Testing - Tested with pinecone destination connector on [this](https://github.com/Unstructured-IO/unstructured/actions/runs/8429035114/job/23082700074?pr=2693) job run. --------- Co-authored-by: Matt Robinson <[email protected]> Co-authored-by: Matt Robinson <[email protected]>
- Loading branch information
1 parent
887e6c9
commit d467922
Showing
20 changed files
with
24,484 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
import os | ||
|
||
from unstructured.documents.elements import Text | ||
from unstructured.embed.vertexai import VertexAIEmbeddingConfig, VertexAIEmbeddingEncoder | ||
|
||
# To use Vertex AI PaLM tou will need to: | ||
# - either, pass the full json content of your GCP VertexAI application credentials to the | ||
# VertexAIEmbeddingConfig as the api_key parameter. (This will create a file in the ``/tmp`` | ||
# directory with the content of the json, and set the GOOGLE_APPLICATION_CREDENTIALS environment | ||
# variable to the **path** of the created file.) | ||
# - or, you'll need to store the path to a manually created service account JSON file as the | ||
# GOOGLE_APPLICATION_CREDENTIALS environment variable. (For more information: | ||
# https://python.langchain.com/docs/integrations/text_embedding/google_vertex_ai_palm) | ||
# - or, you'll need to have the credentials configured for your environment (gcloud, | ||
# workload identity, etc…) | ||
|
||
embedding_encoder = VertexAIEmbeddingEncoder( | ||
config=VertexAIEmbeddingConfig(api_key=os.environ["VERTEXAI_GCP_APP_CREDS_JSON_CONTENT"]) | ||
) | ||
|
||
elements = embedding_encoder.embed_documents( | ||
elements=[Text("This is sentence 1"), Text("This is sentence 2")], | ||
) | ||
|
||
query = "This is the query" | ||
query_embedding = embedding_encoder.embed_query(query=query) | ||
|
||
[print(e.embeddings, e) for e in elements] | ||
print(query_embedding, query) | ||
print(embedding_encoder.is_unit_vector(), embedding_encoder.num_of_dimensions()) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
-c ../constraints.in | ||
-c ../base.txt | ||
openai | ||
tiktoken |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
# | ||
# This file is autogenerated by pip-compile with Python 3.9 | ||
# by the following command: | ||
# | ||
# pip-compile --output-file=ingest/embed-octoai.txt ingest/embed-octoai.in | ||
# | ||
anyio==3.7.1 | ||
# via | ||
# -c ingest/../constraints.in | ||
# httpx | ||
# openai | ||
certifi==2024.2.2 | ||
# via | ||
# -c ingest/../base.txt | ||
# -c ingest/../constraints.in | ||
# httpcore | ||
# httpx | ||
# requests | ||
charset-normalizer==3.3.2 | ||
# via | ||
# -c ingest/../base.txt | ||
# requests | ||
distro==1.9.0 | ||
# via openai | ||
exceptiongroup==1.2.0 | ||
# via anyio | ||
h11==0.14.0 | ||
# via httpcore | ||
httpcore==1.0.4 | ||
# via httpx | ||
httpx==0.27.0 | ||
# via openai | ||
idna==3.6 | ||
# via | ||
# -c ingest/../base.txt | ||
# anyio | ||
# httpx | ||
# requests | ||
openai==1.14.3 | ||
# via -r ingest/embed-octoai.in | ||
pydantic==1.10.14 | ||
# via | ||
# -c ingest/../constraints.in | ||
# openai | ||
regex==2023.12.25 | ||
# via | ||
# -c ingest/../base.txt | ||
# tiktoken | ||
requests==2.31.0 | ||
# via | ||
# -c ingest/../base.txt | ||
# tiktoken | ||
sniffio==1.3.1 | ||
# via | ||
# anyio | ||
# httpx | ||
# openai | ||
tiktoken==0.6.0 | ||
# via -r ingest/embed-octoai.in | ||
tqdm==4.66.2 | ||
# via | ||
# -c ingest/../base.txt | ||
# openai | ||
typing-extensions==4.10.0 | ||
# via | ||
# -c ingest/../base.txt | ||
# openai | ||
# pydantic | ||
urllib3==2.2.1 | ||
# via | ||
# -c ingest/../base.txt | ||
# requests |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
-c ../constraints.in | ||
-c ../base.txt | ||
langchain | ||
langchain-community | ||
langchain-google-vertexai |
Oops, something went wrong.