CLIP semantic image search #1058

flozi00 · 2021-05-13T13:30:41Z

Is your feature request related to a problem? Please describe.
No, it would be just cool

Describe the solution you'd like
Indexing and searching for images by text

Describe alternatives you've considered
Jina already does, but since CLIP is in latest huggingface release it would be cool have it here too

Additional context
I did some runs locally with my own photos and the results were amazing.
Describing images instead of just keywords improves the performance masively, event special query working fine

But the biggest question I have is if you want to have vision data in this framework or not ?

lalitpagaria · 2021-05-14T11:56:01Z

@flozi00 nice suggestion. I also wanted to suggest the same. It is nice to support image documents which will suite VQA, Image search and other use cases.

Fews concerns I have are -

Breaking changes in existing framework where 'text' is hard-coded ie in Document class and other places
Long term support to maintain these features and without active volunteers, deepset will find difficult to maintain these community sponsored features

Overall it is nice to have it in haystack in my view but adding it will require good design discussion and proper long term planning. Frequent breaking changes will not be good. Also I see deepset already have handful and they would need active support from the community.

I see lot of good suggestions from the community, so how about having experimental feature stream to have a playground for these features and graduate matured features to mainline?

@Timoeller @tholor @PiffPaffM

tholor · 2021-06-03T06:31:25Z

It's pretty clear to me that we will eventually add other data types to Haystack. The vision here is really to build natural language interfaces to all kinds of data. This includes texts, images, tables, databases, logs ...

However, we want to nail the text case first and optimize it really end-to-end instead of allowing 5 formats with "50% solutions". TableQA is probably one of the bigger next additions and we are actively working on it right now. So long-story short, VQA is nothing that we will work on in the next weeks for sure, but it's on the longterm roadmap.

@lalitpagaria what do you mean with experimental stream? A separate branch here in the repo?

lalitpagaria · 2021-06-03T07:03:26Z

@tholor I am align with the vision. My only concern is prioritization. Hence suggested if we have process around it. In my view these are two most time consuming steps and of-course critical: Design Discussion and Code Review. Now able to come up with solution to resolve it.

Regarding experimental stream, I mean separate to have module experimental and branch experimental. Which will daily rebased with master. Any new code like VQA, CLIP which is not part of current roadmap or plan will go there. It will have nightly release. So people can contribute there which will have less stringent code review and design process. And once every month or quarter these can be bring to mainline based on user's feedback and roadmap (of course it will go through design discussion and code review). This is just my suggestion, I am open for other idea as well.

INF800 · 2021-06-29T08:44:47Z

Is your feature request related to a problem? Please describe.
No, it would be just cool

Describe the solution you'd like
Indexing and searching for images by text

Describe alternatives you've considered
Jina already does, but since CLIP is in latest huggingface release it would be cool have it here too

Additional context
I did some runs locally with my own photos and the results were amazing.
Describing images instead of just keywords improves the performance masively, event special query working fine

But the biggest question I have is if you want to have vision data in this framework or not ?

Can you please share reference link for the one you've tried. I'd like to see results as well.

Thanks,
Rakesh.

anakin87 · 2023-01-11T20:13:15Z

CLIP support was implemented by @ZanSara in #2418.

I think that this issue can be closed now.

lalitpagaria mentioned this issue Jun 28, 2021

Multimodal search with text and image #1230

Closed

lalitpagaria mentioned this issue Jun 29, 2021

Support for Images in the Document #1238

Closed

flozi00 closed this as completed Jan 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLIP semantic image search #1058

CLIP semantic image search #1058

flozi00 commented May 13, 2021

lalitpagaria commented May 14, 2021 •

edited

Loading

tholor commented Jun 3, 2021

lalitpagaria commented Jun 3, 2021

INF800 commented Jun 29, 2021

anakin87 commented Jan 11, 2023

CLIP semantic image search #1058

CLIP semantic image search #1058

Comments

flozi00 commented May 13, 2021

lalitpagaria commented May 14, 2021 • edited Loading

tholor commented Jun 3, 2021

lalitpagaria commented Jun 3, 2021

INF800 commented Jun 29, 2021

anakin87 commented Jan 11, 2023

lalitpagaria commented May 14, 2021 •

edited

Loading