Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training and evaluation for MultiModalRetriever #3410

Closed
ZanSara opened this issue Oct 18, 2022 · 7 comments
Closed

Training and evaluation for MultiModalRetriever #3410

ZanSara opened this issue Oct 18, 2022 · 7 comments
Labels
topic:retriever type:feature New feature or request wontfix This will not be worked on

Comments

@ZanSara
Copy link
Contributor

ZanSara commented Oct 18, 2022

Currently there's no train() or eval() methods for MultiModalRetriever.

We should add them, taking inspiration from EmbeddingRetriever.

@ZanSara ZanSara added type:feature New feature or request Contributions wanted! Looking for external contributions topic:retriever labels Oct 18, 2022
@anakin87
Copy link
Member

@ZanSara this proposal sounds very interesting and challenging!

If you would like to provide some resources/pointers/examples to get started with the implementation, I think that they would be very helpful for me and the other contributors 😃.

Do you think that this dataset could be useful to test the implementation or do you have better proposals?

@ZanSara ZanSara mentioned this issue Nov 24, 2022
8 tasks
@ZanSara
Copy link
Contributor Author

ZanSara commented Nov 24, 2022

Hey @anakin87! Nice one you've picked! So I should make a premise: I'm not the most knowledgeable in the team about evaluation and training of models, so take my input with a grain of salt 😄

That said:

Have fun with this one 😁 Also, please open separate PRs for the different features in order to keep them small.

@anakin87
Copy link
Member

anakin87 commented Dec 1, 2022

I was thinking about CLIP training/fine-tuning...

At the moment, sentence-transformers doesn't explain how to address this task, although a tutorial is a very requested feature (UKPLab/sentence-transformers#840 https://github.com/UKPLab/sentence-transformers/issues?q=is%3Aissue+is%3Aopen+clip).

I'm sure that even if not available out-of-the-box, the training/fine-tuning can be done using Transformers or other approaches (openai/CLIP#150).

...but there is much more in this issue

Studying a bit the MultiModalRetriever, I see that it virtually accepts several input types, each one with a proper encoder. The constraint is that the sizes of the encoded vectors must match.

So I wonder: is there a unified and effective way to perform the training in such a heterogeneous set of situations?

To give a more focused scope to this issue, I'm curious to hear your opinions: @ZanSara @bogdankostic @julian-risch @vblagoje @mayankjobanputra...

@vblagoje
Copy link
Member

vblagoje commented Dec 2, 2022

@anakin87 my hunch, without proper investigation, is that training/fine-tuning such a model should be outside of Haystack's scope. These models are fine-tuned using accelerate/hf library setup, and I am hard-pressed to see a reason for adding such support in Haystack.

@ZanSara
Copy link
Contributor Author

ZanSara commented Dec 5, 2022

@vblagoje I originally added this issue for consistency with other Retrievers: if you see this unfeasible, let's skip 👍

However, I believe evaluation is still valuable and should be implemented. WDYT?

@vblagoje
Copy link
Member

vblagoje commented Dec 5, 2022

@ZanSara I am not sure, tbh. I would say it is prudent to keep the eval interface we already have as they seem to be more consistent than training APIs. Something we can talk about internally first as well.

@masci masci removed the Contributions wanted! Looking for external contributions label Dec 13, 2023
@anakin87
Copy link
Member

anakin87 commented Feb 6, 2024

Very broad topic; training is not a focus currently.
Closing as won't fix.

@masci masci added the wontfix This will not be worked on label Mar 12, 2024
@masci masci closed this as completed Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:retriever type:feature New feature or request wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

4 participants