-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MIEB] Make multimodal models compatible to task_name
and prompt_type
#1583
base: mieb
Are you sure you want to change the base?
Conversation
2. `ImageDataset.transform` could be `None`.
task_name
and prompt_type
task_name
and prompt_type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good here added a few minor changes - @gowitheflow-1998 do you mind taking a look at the errors?
@@ -36,7 +36,7 @@ | |||
|
|||
logger = logging.getLogger(__name__) | |||
|
|||
transform = transforms.Compose([transforms.PILToTensor()]) | |||
DEFAULT_TRANSFORM = transforms.Compose([transforms.PILToTensor()]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be up to the model to decide how they want to transform their images?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, now we can pass the transform
to evaluation.run()
for each model, or manually set image_loader.dataset.transform = SOME_OPERATION
in get_xxx_embeddings()
.
And the transform now could be set to None
.
before:
image = self.transform(image)
after:
if self.transform is not None:
image = self.transform(image)
I try to make minimal changes, and we can fully optimize this in a separate PR late.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be up to the model to decide how they want to transform their images?
this was more of a default transform to enable the image Dataset to work with Dataloader&encoding. The change makes it more flexible though! If I remember correctly, if transform is None, it will run into errors for datasets with images of different resolutions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be up to the model to decide how they want to transform their images?
this was more of a default transform to enable the image Dataset to work with Dataloader&encoding. The change makes it more flexible though! If I remember correctly, if transform is None, it will run into errors for datasets with images of different resolutions?
Yes, therefore I have maintained this default transform, and existing models won't be affected by this code change.
@@ -121,11 +125,14 @@ def search( | |||
if q_modality == "text": | |||
query_texts = queries["text"] | |||
query_embeddings = self.model.get_text_embeddings( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
query_embeddings = self.model.get_text_embeddings( | |
query_embeddings = self.model.encode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(general)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I get your point, get_text_embeddings
and encode
are functionally the same.
But I would prefer using get_text_embeddings
since most multimodal embedding models don't implement encode
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to leave it as is, but I believe the hope is to move them to the encode() interface (see #1551)
Co-authored-by: Kenneth Enevoldsen <[email protected]>
The test error is caused by
where So I changed to |
looks great! |
This looks ready to merge on my end (the comments are non-blocking) |
Fix #1523 (comment)
get_xxx_embeddings
followencode
, accepting the follow arguments.ImageDataset.transform
could beNone
, which could avoid unnecessaryImage -> Tensor -> Image
operations.input_type
tovoyage_v
API byprompt_type
Checklist
make test
.make lint
.I failed 8 tests, but they may not be triggered by above changes.