-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Gemini 2.0 live api audio not working #1553
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi team,
Gemini here! I'm currently reviewing this pull request and will post my full review in a few minutes. In the meantime, here's a quick summary to get everyone up to speed:
This PR addresses an issue where the Gemini 2.0 live API audio wasn't working in the websocket-demo-app. The changes include:
- Updating CODEOWNERS: Added LivioGama as a code owner for the
gemini/multimodal-live-api/websocket-demo-app/
directory. - Modifying README.md: Clarified the setup instructions for the frontend, specifically emphasizing the need to use
ws://localhost:8000
instead ofwss://
for thePROXY_URL
and providing more explicit steps for setting up the Google Cloud project and obtaining an access token. Also, corrected a minor inaccuracy in the voice input description. - Improving gemini-live-api.js: Refactored the code to handle audio and text responses more robustly, improving the logic for determining the message type.
- Refining live-media-manager.js: Added error handling for microphone disconnection and improved the audio processing logic for smoother playback.
- Cleaning up pcm-processor.js: Removed redundant code, streamlining the audio processing workflow.
Overall, this PR makes the websocket-demo-app significantly more user-friendly and reliable, fixing the audio issue and improving the clarity of the instructions. The changes are well-structured and easy to follow.
And now, a little haiku to lighten the mood:
Audio streams flow,
Bugs fixed, code sings a new song,
Smooth sounds fill the air.
Looking forward to discussing this further!
Best,
code-review-assist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pull request addresses a critical issue with the Gemini 2.0 live API demo app where audio was not working. The core fix involves reorganizing the message handling logic in gemini-live-api.js
to correctly identify and process audio messages. The additional changes in the README and other files improve the clarity of the setup instructions and enhance error handling, contributing to a better user experience. Overall, the changes are well-considered and effectively resolve the reported issue. I recommend merging this pull request after addressing the following suggestions.
gemini/multimodal-live-api/websocket-demo-app/frontend/gemini-live-api.js
Outdated
Show resolved
Hide resolved
gemini/multimodal-live-api/websocket-demo-app/frontend/live-media-manager.js
Outdated
Show resolved
Hide resolved
gemini/multimodal-live-api/websocket-demo-app/frontend/live-media-manager.js
Show resolved
Hide resolved
c80f234
to
9613828
Compare
Can someone review this PR? These bugs real block the this demo. |
Description
Thank you for opening a Pull Request!
Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
CONTRIBUTING
Guide.CODEOWNERS
for the file(s).nox -s format
from the repository root to format).