You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Problem
So far Haystack has been focusing strongly on text-only search. However, the same architecture is likely to be effective on other mediums, such as images.
This epic tracks the implementation of support for image indexing and retrieval in Haystack.
Process
Research on the topic
Candidate models for image retrieval identified: CLIP and Data2Vec
Data2VecVision has been chosen due to the existence of sibling models for text and audio which would allow us to have comparable embeddings across different document types later on (see ImageRetriever #2445).
Investigate and eventually adapt how Haystack loads models from HF.
Extend primitives (Document, Answer, etc) to account for the new data type, and add tests. (WARNING: Big task! Changes might not be as big as initially assumed)
Image-to-text conversion, mainly designed for indexing. Note: this is independent from the rest of the changes, could be even picked up now by some brave external contributor.
Problem
So far Haystack has been focusing strongly on text-only search. However, the same architecture is likely to be effective on other mediums, such as images.
This epic tracks the implementation of support for image indexing and retrieval in Haystack.
Process
Data2VecVision
has been chosen due to the existence of sibling models for text and audio which would allow us to have comparable embeddings across different document types later on (seeImageRetriever
#2445).language_modeling.py
andtokenization.py
#2703MultiModalRetriever
that works in isolation (NOT in a pipeline) with at least one single document store, and add tests.Data2VecVision
models and existing retrievers #2865MultiModalRetriever
#2891MultiModalRetriever
work on all docstores, if possible and not too time consuming.Document
,Answer
, etc) to account for the new data type, and add tests. (WARNING: Big task!Changes might not be as big as initially assumed)MultiModalRetriever
work in a query pipeline, and add tests.MultiModalRetriever
work in a indexing pipeline, and add tests.Later steps (not in order of importance, not blocking each other):
MultiModalRetriever
#3410MultiModalRetriever
work on the REST API and/or make a separate demo for it (a mixed media one would be super-cool tho)ImageToText
&AnswerToImage
#2444The text was updated successfully, but these errors were encountered: