-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEATURE: Enhance embedding functionality with batch and image support. #55
Conversation
Please mark whether you used Copilot to assist coding in this PR
|
for query in queries: | ||
embeddings.append(self.component.embed_query(query)) | ||
return {"embeddings": embeddings} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[CMT] LangChain doesn't expose the function to do batch in single calls, so had to loop here
e57aa96
to
a6af36a
Compare
a6af36a
to
082b07b
Compare
73104cc
to
b46e939
Compare
SonarQube Quality Gate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG
LG |
embedding_type = data.get("type", "document") | ||
|
||
embeddings = None | ||
items = [items] if type(items) != list else items |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[SUG]: You can improve the validation step before getting an error in the next steps. Maybe check None and/or check the content of items based on types (document, image or query).
What is the purpose of this change?
The purpose of this change is to enhance the LangChain embeddings component by adding support for batch processing of embeddings and extending its functionality to include image embeddings. This allows for more efficient handling of multiple embeddings in a single operation and broadens the use cases to include image data alongside text data.
How is this accomplished?
This is accomplished by modifying the input schema to accept a list of items instead of a single text input and introducing methods to handle different types of embeddings (document, query, and image). The invoke method has been refactored to process all item types consistently, delegating the actual embedding process to specialized methods for each type.
Anything reviews should focus on/be aware of?
Reviewers should focus on the changes to the input and output schemas and how the invoke method handles the embedding operations for different types of data. Ensure that the batch processing logic works as intended for all supported embedding types and that the new image embedding functionality is properly integrated.
Changes are backward incompatible, but no one is using this component yet