Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEATURE: Enhance embedding functionality with batch and image support. #55

Merged
merged 21 commits into from
Nov 4, 2024

Conversation

cyrus2281
Copy link

@cyrus2281 cyrus2281 commented Nov 1, 2024

What is the purpose of this change?

The purpose of this change is to enhance the LangChain embeddings component by adding support for batch processing of embeddings and extending its functionality to include image embeddings. This allows for more efficient handling of multiple embeddings in a single operation and broadens the use cases to include image data alongside text data.

How is this accomplished?

This is accomplished by modifying the input schema to accept a list of items instead of a single text input and introducing methods to handle different types of embeddings (document, query, and image). The invoke method has been refactored to process all item types consistently, delegating the actual embedding process to specialized methods for each type.

Anything reviews should focus on/be aware of?

Reviewers should focus on the changes to the input and output schemas and how the invoke method handles the embedding operations for different types of data. Ensure that the batch processing logic works as intended for all supported embedding types and that the new image embedding functionality is properly integrated.

Changes are backward incompatible, but no one is using this component yet

@cyrus2281 cyrus2281 self-assigned this Nov 1, 2024
@cyrus2281 cyrus2281 requested review from efunneko and a team November 1, 2024 14:50
Copy link

gitstream-cm bot commented Nov 1, 2024

Please mark whether you used Copilot to assist coding in this PR

  • Copilot Assisted

Comment on lines 87 to 89
for query in queries:
embeddings.append(self.component.embed_query(query))
return {"embeddings": embeddings}
Copy link
Author

@cyrus2281 cyrus2281 Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CMT] LangChain doesn't expose the function to do batch in single calls, so had to loop here

@cyrus2281 cyrus2281 force-pushed the cyrus/feature/embedding branch from e57aa96 to a6af36a Compare November 1, 2024 15:53
@cyrus2281 cyrus2281 force-pushed the cyrus/feature/embedding branch from a6af36a to 082b07b Compare November 4, 2024 14:40
@cyrus2281 cyrus2281 force-pushed the cyrus/feature/embedding branch from 73104cc to b46e939 Compare November 4, 2024 15:11
Copy link

Copy link
Collaborator

@gregmeldrum gregmeldrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG

@cyrus2281 cyrus2281 merged commit 3b4c99a into main Nov 4, 2024
4 checks passed
@cyrus2281 cyrus2281 deleted the cyrus/feature/embedding branch November 4, 2024 15:48
@alimosaed
Copy link

LG

embedding_type = data.get("type", "document")

embeddings = None
items = [items] if type(items) != list else items

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SUG]: You can improve the validation step before getting an error in the next steps. Maybe check None and/or check the content of items based on types (document, image or query).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants