-
Notifications
You must be signed in to change notification settings - Fork 817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support specifying an encoder via unstructured-ingest #1782
Labels
enhancement
New feature or request
Comments
github-merge-queue bot
pushed a commit
that referenced
this issue
Nov 6, 2023
…d deterministic ingest test for embeddings (#1918) Closes #1782 This PR: - Extends ingest pipeline so that it is possible to select an embedding provider from a range of providers - Modifies the ingest embedding test to be a diff test, since the embedding vectors are reproducible after supporting multiple providers Additional info on the chosen provider for the test: - Found `langchain.embeddings.HuggingFaceEmbeddings` to be deterministic even when there's no seed set - Took 6.84s to pass a unit test with the provider (without cache, including model download) - `langchain.embeddings.HuggingFaceEmbeddings` runs in local, making it zero cost For all these reasons, testing embedding modules with the Huggingface model seems to be making sense --------- Co-authored-by: cragwolfe <[email protected]> Co-authored-by: ryannikolaidis <[email protected]> Co-authored-by: ahmetmeleq <[email protected]>
shreyanid
pushed a commit
that referenced
this issue
Nov 6, 2023
…d deterministic ingest test for embeddings (#1918) Closes #1782 This PR: - Extends ingest pipeline so that it is possible to select an embedding provider from a range of providers - Modifies the ingest embedding test to be a diff test, since the embedding vectors are reproducible after supporting multiple providers Additional info on the chosen provider for the test: - Found `langchain.embeddings.HuggingFaceEmbeddings` to be deterministic even when there's no seed set - Took 6.84s to pass a unit test with the provider (without cache, including model download) - `langchain.embeddings.HuggingFaceEmbeddings` runs in local, making it zero cost For all these reasons, testing embedding modules with the Huggingface model seems to be making sense --------- Co-authored-by: cragwolfe <[email protected]> Co-authored-by: ryannikolaidis <[email protected]> Co-authored-by: ahmetmeleq <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Is your feature request related to a problem? Please describe.
Currently we only support an OpenAI encoder in order to fetch embeddings for given results (list of Elements). By extension, when a user sets the
embedding-api-key
flag, we assume they are setting an OpenAI key and using that encoder. After #1738 and #1619 merge, we will have two additional encoding options, but no way to use these through unstructured-ingest.Describe the solution you'd like
Unstructured-ingest cli and Runners should support an option that allows the user to specify the encoder they want to leverage to create embeddings for their results.
Describe alternatives you've considered
NA
Additional context
NA
The text was updated successfully, but these errors were encountered: