New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[Bug]: Getting started with Vertex AI Gemini 1.5 Pro: Audio Understanding observation 2 #1166

Closed

1 task done

ghost opened this issue Sep 24, 2024 · 1 comment

ghost commented Sep 24, 2024

File Name

intro_gemini_1_5_pro.ipynb

What happened?

In the Audio Understanding section of the notebook:

As far as I can tell, the mp3 has 4 distinct voices: the host, the voice that announces the name of the podcast and host, and the two guests. Transcription lists 5 speakers (A, B, C, D and E).

Relevant log output

No response

Code of Conduct

I agree to follow this project's Code of Conduct

Contributor

gericdong commented Oct 18, 2024

Thanks for reporting this issue. While we're improving the model quality, there are still some limitations with the audio understanding with Gemini (https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/audio-understanding#limitations)

gericdong closed this as completed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment