Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Gemini 2.0 live api audio not working #1553

Merged
merged 10 commits into from
Dec 20, 2024

Conversation

LivioGama
Copy link
Contributor

@LivioGama LivioGama commented Dec 17, 2024

Description

Thank you for opening a Pull Request!
Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Follow the CONTRIBUTING Guide.
  • You are listed as the author in your notebook or README file.
  • Your account is listed in CODEOWNERS for the file(s).
  • Make your Pull Request title in the https://www.conventionalcommits.org/ specification.
  • Ensure the tests and linter pass (Run nox -s format from the repository root to format).
  • Appropriate docs were updated (if necessary)

@LivioGama LivioGama requested a review from a team as a code owner December 17, 2024 23:40
Copy link

google-cla bot commented Dec 17, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Copy link
Contributor

@code-review-assist code-review-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi team,

Gemini here! I'm currently reviewing this pull request and will post my full review in a few minutes. In the meantime, here's a quick summary to get everyone up to speed:

This PR addresses an issue where the Gemini 2.0 live API audio wasn't working in the websocket-demo-app. The changes include:

  • Updating CODEOWNERS: Added LivioGama as a code owner for the gemini/multimodal-live-api/websocket-demo-app/ directory.
  • Modifying README.md: Clarified the setup instructions for the frontend, specifically emphasizing the need to use ws://localhost:8000 instead of wss:// for the PROXY_URL and providing more explicit steps for setting up the Google Cloud project and obtaining an access token. Also, corrected a minor inaccuracy in the voice input description.
  • Improving gemini-live-api.js: Refactored the code to handle audio and text responses more robustly, improving the logic for determining the message type.
  • Refining live-media-manager.js: Added error handling for microphone disconnection and improved the audio processing logic for smoother playback.
  • Cleaning up pcm-processor.js: Removed redundant code, streamlining the audio processing workflow.

Overall, this PR makes the websocket-demo-app significantly more user-friendly and reliable, fixing the audio issue and improving the clarity of the instructions. The changes are well-structured and easy to follow.


And now, a little haiku to lighten the mood:

Audio streams flow,
Bugs fixed, code sings a new song,
Smooth sounds fill the air.

Looking forward to discussing this further!

Best,
code-review-assist

@LivioGama
Copy link
Contributor Author

LivioGama commented Dec 17, 2024

This fixes those issues:

CleanShot 2024-12-18 at 00 14 57@2x

CleanShot 2024-12-18 at 00 09 38@2x

Copy link
Contributor

@code-review-assist code-review-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pull request addresses a critical issue with the Gemini 2.0 live API demo app where audio was not working. The core fix involves reorganizing the message handling logic in gemini-live-api.js to correctly identify and process audio messages. The additional changes in the README and other files improve the clarity of the setup instructions and enhance error handling, contributing to a better user experience. Overall, the changes are well-considered and effectively resolve the reported issue. I recommend merging this pull request after addressing the following suggestions.

@LivioGama LivioGama force-pushed the main branch 2 times, most recently from c80f234 to 9613828 Compare December 17, 2024 23:59
.github/CODEOWNERS Outdated Show resolved Hide resolved
gemini/multimodal-live-api/websocket-demo-app/README.md Outdated Show resolved Hide resolved
@zorrofox
Copy link

Can someone review this PR? These bugs real block the this demo.

@ZackAkil ZackAkil self-requested a review December 20, 2024 16:34
@holtskinner holtskinner merged commit fa42967 into GoogleCloudPlatform:main Dec 20, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants