[MIEB] Make multimodal models compatible to `task_name` and `prompt_type` #1583

izhx · 2024-12-12T07:30:08Z

Make get_xxx_embeddings follow encode, accepting the follow arguments.

    task_name=task_name,
    prompt_type=PromptType.passage,
    **self.encode_kwargs,

ImageDataset.transform could be None, which could avoid unnecessary Image -> Tensor -> Image operations.
Pass input_type to voyage_v API by prompt_type

Checklist

Run tests locally to make sure nothing is broken using make test.
Run the formatter to format the code using make lint.

I failed 8 tests, but they may not be triggered by above changes.

FAILED tests/test_benchmark/test_benchmark_integration_with_datasets.py::test_benchmark_sentence_transformer[model0-task0] - TypeError: 'NoneType' object is not callable
FAILED tests/test_reproducible_workflow.py::test_reproducibility_workflow[8b3219a92973c328a8e22fadcfa821b5dc75636a-sentence-transformers/all-MiniLM-L6-v2-BornholmBitextMining] - TypeError: 'NoneType' object is not callable
FAILED tests/test_tasks/test_all_abstasks.py::test_dataset_availability - aiohttp.client_exceptions.ConnectionTimeoutError: Connection timeout to host https://huggingface.co/datasets/nguha/legalbench/tree/12ca3b695563788fead87a982ad1a0682844...
FAILED tests/test_tasks/test_mieb_datasets.py::test_benchmark_sentence_transformer[model0-task1] - ValueError: Input arrays use different devices: cpu, cpu
FAILED tests/test_cli.py::test_run_task[average_word_embeddings_komninos-BornholmBitextMining-21eec43590414cb8e3a6f654857abed0483ae36e] - TypeError: 'NoneType' object is not callable
FAILED tests/test_benchmark/test_benchmark_integration_with_datasets.py::test_benchmark_sentence_transformer[model0-FarsTail] - FileNotFoundError: Unable to find 'https://huggingface.co/datasets/azarijafari/FarsTail/resolve/7335288588f14e5a687d97fc979194c2abe6f4e7/data/Test-word.csv'
FAILED tests/test_benchmark/test_benchmark_integration_with_datasets.py::test_benchmark_sentence_transformer[model0-BrazilianToxicTweetsClassification] - TypeError: 'NoneType' object is not callable
FAILED tests/test_cli.py::test_run_task[intfloat/multilingual-e5-small-BornholmBitextMining-fd1525a9fd15316a2d503bf26ab031a61d056e98] - TypeError: 'NoneType' object is not callable

2. `ImageDataset.transform` could be `None`.

KennethEnevoldsen

Looks good here added a few minor changes - @gowitheflow-1998 do you mind taking a look at the errors?

KennethEnevoldsen · 2024-12-12T22:59:27Z

mteb/evaluation/evaluators/Image/Any2AnyRetrievalEvaluator.py

@@ -36,7 +36,7 @@

 logger = logging.getLogger(__name__)

-transform = transforms.Compose([transforms.PILToTensor()])
+DEFAULT_TRANSFORM = transforms.Compose([transforms.PILToTensor()])


Shouldn't it be up to the model to decide how they want to transform their images?

Yes, now we can pass the transform to evaluation.run() for each model, or manually set image_loader.dataset.transform = SOME_OPERATION in get_xxx_embeddings().
And the transform now could be set to None.

before: image = self.transform(image) after: if self.transform is not None: image = self.transform(image)

I try to make minimal changes, and we can fully optimize this in a separate PR late.

Shouldn't it be up to the model to decide how they want to transform their images?

this was more of a default transform to enable the image Dataset to work with Dataloader&encoding. The change makes it more flexible though! If I remember correctly, if transform is None, it will run into errors for datasets with images of different resolutions?

Shouldn't it be up to the model to decide how they want to transform their images?

this was more of a default transform to enable the image Dataset to work with Dataloader&encoding. The change makes it more flexible though! If I remember correctly, if transform is None, it will run into errors for datasets with images of different resolutions?

Yes, therefore I have maintained this default transform, and existing models won't be affected by this code change.

KennethEnevoldsen · 2024-12-12T23:00:19Z

mteb/evaluation/evaluators/Image/Any2AnyRetrievalEvaluator.py

@@ -121,11 +125,14 @@ def search(
        if q_modality == "text":
            query_texts = queries["text"]
            query_embeddings = self.model.get_text_embeddings(


Suggested change

query_embeddings = self.model.get_text_embeddings(

query_embeddings = self.model.encode

Hi, I get your point, get_text_embeddings and encode are functionally the same.
But I would prefer using get_text_embeddings since most multimodal embedding models don't implement encode.

Feel free to leave it as is, but I believe the hope is to move them to the encode() interface (see #1551)

mteb/evaluation/evaluators/Image/Any2AnyRetrievalEvaluator.py

mteb/models/align_models.py

Co-authored-by: Kenneth Enevoldsen <[email protected]>

izhx · 2024-12-13T07:22:16Z

FAILED tests/test_tasks/test_mieb_datasets.py::test_benchmark_sentence_transformer[model0-task1] - ValueError: Input arrays use different devices: cpu, cpu

The test error is caused by

mteb/mteb/evaluation/evaluators/Image/ZeroshotClassificationEvaluator.py

Line 87 in d2bb0ac

accuracy = metrics.accuracy_score(self.labels, predictions)

where self.labels is list (the get_candidate_labels is annotated by list[str]) and predictions is tensor.

So I changed to predictions.tolist()

gowitheflow-1998 · 2024-12-13T08:10:38Z

looks great!

KennethEnevoldsen · 2024-12-13T19:29:46Z

This looks ready to merge on my end (the comments are non-blocking)

1. Make get_xxx_embeddings follow encode.

8b1abd9

2. `ImageDataset.transform` could be `None`.

izhx changed the title ~~Make multimodal models compatible to task_name and prompt_type~~ [MIEB] Make multimodal models compatible to task_name and prompt_type Dec 12, 2024

Samoed mentioned this pull request Dec 12, 2024

Discussing a standard for ImageEncoders #1551

Open

KennethEnevoldsen approved these changes Dec 12, 2024

View reviewed changes

izhx and others added 3 commits December 13, 2024 11:07

Apply suggestions from code review

d72b345

Co-authored-by: Kenneth Enevoldsen <[email protected]>

Fix arguments

194d8de

Try to fix tests

9393ed0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MIEB] Make multimodal models compatible to `task_name` and `prompt_type` #1583

[MIEB] Make multimodal models compatible to `task_name` and `prompt_type` #1583

izhx commented Dec 12, 2024

KennethEnevoldsen left a comment •

edited

Loading

KennethEnevoldsen Dec 12, 2024

izhx Dec 13, 2024

gowitheflow-1998 Dec 13, 2024

izhx Dec 13, 2024

KennethEnevoldsen Dec 12, 2024

KennethEnevoldsen Dec 12, 2024

izhx Dec 13, 2024

KennethEnevoldsen Dec 13, 2024

izhx commented Dec 13, 2024

gowitheflow-1998 commented Dec 13, 2024

KennethEnevoldsen commented Dec 13, 2024

	query_embeddings = self.model.get_text_embeddings(
	query_embeddings = self.model.encode

[MIEB] Make multimodal models compatible to task_name and prompt_type #1583

Are you sure you want to change the base?

[MIEB] Make multimodal models compatible to task_name and prompt_type #1583

Conversation

izhx commented Dec 12, 2024

Checklist

KennethEnevoldsen left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

izhx commented Dec 13, 2024

gowitheflow-1998 commented Dec 13, 2024

KennethEnevoldsen commented Dec 13, 2024

[MIEB] Make multimodal models compatible to `task_name` and `prompt_type` #1583

[MIEB] Make multimodal models compatible to `task_name` and `prompt_type` #1583

KennethEnevoldsen left a comment •

edited

Loading