[Feat]: What is the best workaround for grounding with image input? #1194

gustininho · 2024-09-29T13:10:53Z

Is your feature request related to a problem? Please describe.

I'm frustrated that VertexAI grounding is not supported if input is non-text.

Describe the solution you'd like

Is there currently a convenient workaround for it? I'd like to have this functionality of being able to ask for ex:

input(what this item is made of? + [image]) -> grounded search of what item like this is usually made of in the document -> output(text)

Describe alternatives you've considered

There is obviously a way of making 2 separate calls:

Querying about what is the item in the image
Pass that text output to a model with grounding which inputs only text

However, that increases costs massively.

Additional context

No response

Code of Conduct

I agree to follow this project's Code of Conduct

holtskinner · 2024-12-04T16:57:53Z

The best workaround we have, if you specifically need to use the grounding feature, is to do two separate calls. One to get information about the image/document (without grounding), then another with the image description with grounding enabled.

holtskinner closed this as completed Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat]: What is the best workaround for grounding with image input? #1194

[Feat]: What is the best workaround for grounding with image input? #1194

gustininho commented Sep 29, 2024

holtskinner commented Dec 4, 2024

[Feat]: What is the best workaround for grounding with image input? #1194

[Feat]: What is the best workaround for grounding with image input? #1194

Comments

gustininho commented Sep 29, 2024

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Code of Conduct

holtskinner commented Dec 4, 2024