-
Notifications
You must be signed in to change notification settings - Fork 27.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BlipModel: get_multimodal_features method #30438
BlipModel: get_multimodal_features method #30438
Conversation
Who can review?
References:[1] https://github.com/salesforce/LAVIS/blob/main/lavis/models/blip_models/blip_feature_extractor.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this @XavierSpycy!
Can you add tests for this method, similar to the other feature methods?
@amyeroberts Sure, I've added the requested tests for my new method and also included tests for the two existing methods, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this feature and tests!
* add_blip_get_multimodal_feautres * Fix docstring error * reimplement get_multimodal_features * fix error * recheck code quality * add new necessary tests
What does this PR do?
This PR introduces a new method
get_multimodal_features
to theBlipModel
in thetransformers
library. This method allows the extraction of pretrained multimodal features by seamlessly integrating text and image features, a functionality that is present in the original LAVIS library developed by the BLIP paper's authors but was missing intransformers
.Motivation
In the course of developing applications that leverage multimodal data, it is often necessary to obtain integrated text and image features without training models from scratch. The original BLIP model, as described in its foundational paper and implemented in the LAVIS library, includes methods like
get_image_features
,get_text_features
, andget_multimodal_features
. However,transformers
currently lacks theget_multimodal_features method
. This PR aims to fill this gap by introducing a method that adheres to the design and functionality of thetransformers
library while staying true to the original implementation in LAVIS.Description
The
get_multimodal_features method
implemented in this PR utilizes the existing architecture and methods of theBlipModel
to process input text and images and outputs their combined features. This feature is crucial for researchers and developers who need to leverage the pre-trained capabilities of the BLIP model for various downstream tasks without the overhead of training the integration from scratch.Documentation
Documentation has been updated to reflect the addition of the
get_multimodal_features
method. The update includes descriptions of the method's purpose, usage, and example code snippets that demonstrate how to use the feature in practice.Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.