Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Getting started with Vertex AI Gemini 1.5 Pro: Audio Understanding observation 2 #1166

Closed
1 task done
ghost opened this issue Sep 24, 2024 · 1 comment
Closed
1 task done

Comments

@ghost
Copy link

ghost commented Sep 24, 2024

File Name

intro_gemini_1_5_pro.ipynb

What happened?

In the Audio Understanding section of the notebook:

  • As far as I can tell, the mp3 has 4 distinct voices: the host, the voice that announces the name of the podcast and host, and the two guests. Transcription lists 5 speakers (A, B, C, D and E).

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@gericdong
Copy link
Contributor

Thanks for reporting this issue. While we're improving the model quality, there are still some limitations with the audio understanding with Gemini (https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/audio-understanding#limitations)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant