Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feat]: What is the best workaround for grounding with image input? #1194

Closed
1 task done
gustininho opened this issue Sep 29, 2024 · 1 comment
Closed
1 task done

Comments

@gustininho
Copy link

Is your feature request related to a problem? Please describe.

I'm frustrated that VertexAI grounding is not supported if input is non-text.

Describe the solution you'd like

Is there currently a convenient workaround for it? I'd like to have this functionality of being able to ask for ex:

input(what this item is made of? + [image]) -> grounded search of what item like this is usually made of in the document -> output(text)

Describe alternatives you've considered

There is obviously a way of making 2 separate calls:

  1. Querying about what is the item in the image
  2. Pass that text output to a model with grounding which inputs only text

However, that increases costs massively.

Additional context

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@holtskinner
Copy link
Collaborator

The best workaround we have, if you specifically need to use the grounding feature, is to do two separate calls. One to get information about the image/document (without grounding), then another with the image description with grounding enabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants