Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MIEB] Make multimodal models compatible to task_name and prompt_type #1583

Open
wants to merge 4 commits into
base: mieb
Choose a base branch
from

Conversation

izhx
Copy link
Contributor

@izhx izhx commented Dec 12, 2024

Fix #1523 (comment)

  1. Make get_xxx_embeddings follow encode, accepting the follow arguments.
    task_name=task_name,
    prompt_type=PromptType.passage,
    **self.encode_kwargs,
  1. ImageDataset.transform could be None, which could avoid unnecessary Image -> Tensor -> Image operations.
  2. Pass input_type to voyage_v API by prompt_type

Checklist

  • Run tests locally to make sure nothing is broken using make test.
  • Run the formatter to format the code using make lint.

I failed 8 tests, but they may not be triggered by above changes.

FAILED tests/test_benchmark/test_benchmark_integration_with_datasets.py::test_benchmark_sentence_transformer[model0-task0] - TypeError: 'NoneType' object is not callable
FAILED tests/test_reproducible_workflow.py::test_reproducibility_workflow[8b3219a92973c328a8e22fadcfa821b5dc75636a-sentence-transformers/all-MiniLM-L6-v2-BornholmBitextMining] - TypeError: 'NoneType' object is not callable
FAILED tests/test_tasks/test_all_abstasks.py::test_dataset_availability - aiohttp.client_exceptions.ConnectionTimeoutError: Connection timeout to host https://huggingface.co/datasets/nguha/legalbench/tree/12ca3b695563788fead87a982ad1a0682844...
FAILED tests/test_tasks/test_mieb_datasets.py::test_benchmark_sentence_transformer[model0-task1] - ValueError: Input arrays use different devices: cpu, cpu
FAILED tests/test_cli.py::test_run_task[average_word_embeddings_komninos-BornholmBitextMining-21eec43590414cb8e3a6f654857abed0483ae36e] - TypeError: 'NoneType' object is not callable
FAILED tests/test_benchmark/test_benchmark_integration_with_datasets.py::test_benchmark_sentence_transformer[model0-FarsTail] - FileNotFoundError: Unable to find 'https://huggingface.co/datasets/azarijafari/FarsTail/resolve/7335288588f14e5a687d97fc979194c2abe6f4e7/data/Test-word.csv'
FAILED tests/test_benchmark/test_benchmark_integration_with_datasets.py::test_benchmark_sentence_transformer[model0-BrazilianToxicTweetsClassification] - TypeError: 'NoneType' object is not callable
FAILED tests/test_cli.py::test_run_task[intfloat/multilingual-e5-small-BornholmBitextMining-fd1525a9fd15316a2d503bf26ab031a61d056e98] - TypeError: 'NoneType' object is not callable

2. `ImageDataset.transform` could be `None`.
@izhx izhx changed the title Make multimodal models compatible to task_name and prompt_type [MIEB] Make multimodal models compatible to task_name and prompt_type Dec 12, 2024
Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good here added a few minor changes - @gowitheflow-1998 do you mind taking a look at the errors?

@@ -36,7 +36,7 @@

logger = logging.getLogger(__name__)

transform = transforms.Compose([transforms.PILToTensor()])
DEFAULT_TRANSFORM = transforms.Compose([transforms.PILToTensor()])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be up to the model to decide how they want to transform their images?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, now we can pass the transform to evaluation.run() for each model, or manually set image_loader.dataset.transform = SOME_OPERATION in get_xxx_embeddings().
And the transform now could be set to None.

before:
        image = self.transform(image)

after:
        if self.transform is not None:
            image = self.transform(image)

I try to make minimal changes, and we can fully optimize this in a separate PR late.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be up to the model to decide how they want to transform their images?

this was more of a default transform to enable the image Dataset to work with Dataloader&encoding. The change makes it more flexible though! If I remember correctly, if transform is None, it will run into errors for datasets with images of different resolutions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be up to the model to decide how they want to transform their images?

this was more of a default transform to enable the image Dataset to work with Dataloader&encoding. The change makes it more flexible though! If I remember correctly, if transform is None, it will run into errors for datasets with images of different resolutions?

Yes, therefore I have maintained this default transform, and existing models won't be affected by this code change.

@@ -121,11 +125,14 @@ def search(
if q_modality == "text":
query_texts = queries["text"]
query_embeddings = self.model.get_text_embeddings(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
query_embeddings = self.model.get_text_embeddings(
query_embeddings = self.model.encode

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(general)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I get your point, get_text_embeddings and encode are functionally the same.
But I would prefer using get_text_embeddings since most multimodal embedding models don't implement encode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to leave it as is, but I believe the hope is to move them to the encode() interface (see #1551)

mteb/models/align_models.py Outdated Show resolved Hide resolved
@izhx
Copy link
Contributor Author

izhx commented Dec 13, 2024

FAILED tests/test_tasks/test_mieb_datasets.py::test_benchmark_sentence_transformer[model0-task1] - ValueError: Input arrays use different devices: cpu, cpu

The test error is caused by

accuracy = metrics.accuracy_score(self.labels, predictions)

where self.labels is list (the get_candidate_labels is annotated by list[str]) and predictions is tensor.

So I changed to predictions.tolist()

@gowitheflow-1998
Copy link
Contributor

looks great!

@KennethEnevoldsen
Copy link
Contributor

This looks ready to merge on my end (the comments are non-blocking)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants