Skip to content

Commit

Permalink
Formatting
Browse files Browse the repository at this point in the history
  • Loading branch information
holtskinner committed Dec 20, 2024
1 parent df4623d commit e530b5f
Show file tree
Hide file tree
Showing 6 changed files with 182 additions and 192 deletions.
4 changes: 4 additions & 0 deletions gemini/multimodal-live-api/websocket-demo-app/.prettierrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"bracketSameLine": true,
"tabWidth": 4
}
52 changes: 26 additions & 26 deletions gemini/multimodal-live-api/websocket-demo-app/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,15 @@ While some web development experience, particularly with localhost, port numbers

### File Structure

- backend/main.py: The Python backend code
- backend/requirements.txt: Lists the required Python dependencies
- `backend/main.py`: The Python backend code
- `backend/requirements.txt`: Lists the required Python dependencies

- frontend/index.html: The frontend HTML app
- frontend/script.js: Main frontend JavaScript code
- frontend/gemini-live-api.js: Script for interacting with the Gemini API
- frontend/live-media-manager.js: Script for handling media input and output
- frontend/pcm-processor.js: Script for processing PCM audio
- frontend/cookieJar.js: Script for managing cookies
- `frontend/index.html`: The frontend HTML app
- `frontend/script.js`: Main frontend JavaScript code
- `frontend/gemini-live-api.js`: Script for interacting with the Gemini API
- `frontend/live-media-manager.js`: Script for handling media input and output
- `frontend/pcm-processor.js`: Script for processing PCM audio
- `frontend/cookieJar.js`: Script for managing cookies

![Demo](https://storage.googleapis.com/cloud-samples-data/generative-ai/image/demo-UI.png)

Expand All @@ -40,26 +40,26 @@ git clone https://github.com/GoogleCloudPlatform/generative-ai.git
cd generative-ai/gemini/multimodal-live-api/websocket-demo-app
```

2. Create a new virtual environment and activate it:
1. Create a new virtual environment and activate it:

```sh
python3 -m venv env
source env/bin/activate
```

3. Install dependencies:
1. Install dependencies:

```sh
pip3 install -r backend/requirements.txt
```

4. Start the Python WebSocket server:
1. Start the Python WebSocket server:

```sh
python3 backend/main.py
```

5. Start the frontend:
1. Start the frontend:

- Navigate to `script.js` on line 9, `const PROXY_URL = "wss://[THE_URL_YOU_COPIED_WITHOUT_HTTP]";` and replace `PROXY_URL` value with `ws://localhost:8000`. It should look like: `const PROXY_URL = "ws://localhost:8000;";`. Note the absence of the second "s" in "wss" as "ws" indicates a non-secure WebSocket connection.
- Right below on line 10, update `PROJECT_ID` with your Google Cloud project ID.
Expand All @@ -71,9 +71,9 @@ cd frontend
python3 -m http.server
```

6. Point your browser to the demo app UI based on the output of the terminal. (E.g., it may be http://localhost:8000, or it may use a different port.)
1. Point your browser to the demo app UI based on the output of the terminal. (E.g., it may be http://localhost:8000, or it may use a different port.)

7. Get your Google Cloud access token:
1. Get your Google Cloud access token:
Run the following command in a terminal with gcloud installed to set your project, and to retrieve your access token.

```sh
Expand All @@ -83,16 +83,16 @@ gcloud config set project YOUR-PROJECT-ID
gcloud auth print-access-token
```

8. Copy the access token from the previous step into the UI that you have open in your browser.
1. Copy the access token from the previous step into the UI that you have open in your browser.

9. Enter the model ID in the UI:
1. Enter the model ID in the UI:
Replace `YOUR-PROJECT-ID` in the input with your Google Cloud Project ID.

10. Connect and interact with the demo:
2. Connect and interact with the demo:

- After entering your Access Token and Model ID, press the connect button to connect your web app. Now you should be able to interact with Gemini 2.0 with the Multimodal Live API.

11. To interact with the app, you can do the following:
1. To interact with the app, you can do the following:

- Text input: You can write a text prompt to send to the model by entering your message in the box and pressing the send arrow. The model will then respond via audio (turn up your volume!).
- Voice input: Press the microphone button to stop speaking. The model will respond via audio. If you would like to mute your microphone, press the button with a slash through the microphone.
Expand All @@ -102,29 +102,29 @@ gcloud auth print-access-token

1. Open [Cloud Shell](https://cloud.google.com/shell/docs/editor-overview)

2. Upload the frontend and backend folders to your Cloud Shell Editor project. Alternatively, you can clone the repository and cd into the correct directory:
1. Upload the frontend and backend folders to your Cloud Shell Editor project. Alternatively, you can clone the repository and cd into the correct directory:

```sh
git clone https://github.com/GoogleCloudPlatform/generative-ai.git
cd generative-ai/gemini/multimodal-live-api/websocket-demo-app
```

3. Open two new terminal windows.
4. Navigate to whichever folder in Cloud Shell you uploaded the code files to (i.e., using `cd your_folder_name`)
1. Open two new terminal windows.
1. Navigate to whichever folder in Cloud Shell you uploaded the code files to (i.e., using `cd your_folder_name`)

5. Install dependencies: In one of the terminal windows run:
1. Install dependencies: In one of the terminal windows run:

```sh
pip3 install -r backend/requirements.txt
```

6. Start the Python WebSocket server in one terminal.
1. Start the Python WebSocket server in one terminal.

```sh
python3 backend/main.py
```

7. In order for index.html to work properly, you will need to update the app URL inside script.js to point to the correct proxy server URL you just set up in the previous step. To do so:
1. In order for index.html to work properly, you will need to update the app URL inside script.js to point to the correct proxy server URL you just set up in the previous step. To do so:

- Click on Web Preview (to the right of the Open Terminal button near the top)
- Click "Preview on port 8080" (the port where you've setup the proxy server in the previous step)
Expand All @@ -134,7 +134,7 @@ python3 backend/main.py
- Replace `wss://your websocket server` with `wss://[THE_URL_YOU_COPIED_WITHOUT_HTTP]`. For example, it should look like: `const PROXY_URL = "wss://8080-cs-123456789-default.cs-us-central1-abcd.cloudshell.dev";`
- save the changes you've made to script.js

8. Start the frontend:
1. Start the frontend:
In the second terminal window, run the command below. Keep the backend server running in the first terminal.
(Make sure you have navigated to the folder containing the code files, i.e. using `cd your_folder_name`)

Expand All @@ -143,7 +143,7 @@ cd frontend
python3 -m http.server
```

9. Test the demo app:
1. Test the demo app:

- Navigate to the Web Preview button again
- Click on "Change port"
Expand Down
Original file line number Diff line number Diff line change
@@ -1,75 +1,68 @@

class GeminiLiveResponseMessage {
constructor(data) {

this.data = "";
this.type = "";
this.endOfTurn = data?.serverContent?.turnComplete;

const parts = data?.serverContent?.modelTurn?.parts
const parts = data?.serverContent?.modelTurn?.parts;

if (data?.setupComplete) {
this.type = "SETUP COMPLETE"
}
else if (parts?.length && parts[0].text) {
this.type = "SETUP COMPLETE";
} else if (parts?.length && parts[0].text) {
this.data = parts[0].text;
this.type = "TEXT"
}
else if (parts?.length && parts[0].inlineData) {
this.type = "TEXT";
} else if (parts?.length && parts[0].inlineData) {
this.data = parts[0].inlineData.data;
this.type = "AUDIO"
this.type = "AUDIO";
}
}
}


class GeminiLiveAPI {

constructor(proxyUrl, projectId, model, apiHost) {

this.proxyUrl = proxyUrl;

this.projectId = projectId
this.model = model
this.modelUri = `projects/${this.projectId}/locations/us-central1/publishers/google/models/${this.model}`
this.projectId = projectId;
this.model = model;
this.modelUri = `projects/${this.projectId}/locations/us-central1/publishers/google/models/${this.model}`;

this.responseModalities = ["AUDIO"]
this.systemInstructions = ""
this.responseModalities = ["AUDIO"];
this.systemInstructions = "";

this.apiHost = apiHost
this.serviceUrl = `wss://${this.apiHost}/ws/google.cloud.aiplatform.v1beta1.LlmBidiService/BidiGenerateContent`
this.apiHost = apiHost;
this.serviceUrl = `wss://${this.apiHost}/ws/google.cloud.aiplatform.v1beta1.LlmBidiService/BidiGenerateContent`;

this.onReceiveResponse = (message) => {
console.log("Default message received callback", message)
}
console.log("Default message received callback", message);
};

this.onConnectionStarted = () => {
console.log("Default onConnectionStarted")
}
console.log("Default onConnectionStarted");
};

this.onErrorMessage = (message) => {
alert(message);
}
};

this.accessToken = ''
this.websocket = null
this.accessToken = "";
this.websocket = null;

console.log("Created Gemini Live API object: ", this)
console.log("Created Gemini Live API object: ", this);
}

setProjectId(projectId) {
this.projectId = projectId
this.modelUri = `projects/${this.projectId}/locations/us-central1/publishers/google/models/${this.model}`
this.projectId = projectId;
this.modelUri = `projects/${this.projectId}/locations/us-central1/publishers/google/models/${this.model}`;
}

setAccessToken(newAccessToken) {
console.log("setting access token: ", newAccessToken)
this.accessToken = newAccessToken
console.log("setting access token: ", newAccessToken);
this.accessToken = newAccessToken;
}

connect(accessToken) {
this.setAccessToken(accessToken)
this.setupWebSocketToService()
this.setAccessToken(accessToken);
this.setupWebSocketToService();
}

disconnect() {
Expand All @@ -81,10 +74,10 @@ class GeminiLiveAPI {
}

onReceiveMessage(messageEvent) {
console.log("Message received: ", messageEvent)
console.log("Message received: ", messageEvent);
const messageData = JSON.parse(messageEvent.data);
const message = new GeminiLiveResponseMessage(messageData);
console.log("onReceiveMessageCallBack this ", this)
console.log("onReceiveMessageCallBack this ", this);
this.onReceiveResponse(message);
}

Expand All @@ -101,7 +94,6 @@ class GeminiLiveAPI {
this.webSocket.onerror = (event) => {
console.log("websocket error: ", event);
this.onErrorMessage("Connection error");

};

this.webSocket.onopen = (event) => {
Expand All @@ -113,28 +105,28 @@ class GeminiLiveAPI {
this.webSocket.onmessage = this.onReceiveMessage.bind(this);
}


sendInitialSetupMessages() {

const serviceSetupMessage = {
bearer_token: this.accessToken,
service_url: this.serviceUrl
service_url: this.serviceUrl,
};
this.sendMessage(serviceSetupMessage)
this.sendMessage(serviceSetupMessage);

const sessionSetupMessage = {
setup: {
model: this.modelUri,
generation_config: { response_modalities: this.responseModalities },
system_instruction: { parts: [{ text: this.systemInstructions }] }
}
}
this.sendMessage(sessionSetupMessage)

generation_config: {
response_modalities: this.responseModalities,
},
system_instruction: {
parts: [{ text: this.systemInstructions }],
},
},
};
this.sendMessage(sessionSetupMessage);
}

sendTextMessage(text) {

const textMessage = {
client_content: {
turns: [
Expand All @@ -146,7 +138,7 @@ class GeminiLiveAPI {
turn_complete: true,
},
};
this.sendMessage(textMessage)
this.sendMessage(textMessage);
}

sendRealtimeInputMessage(data, mime_type) {
Expand All @@ -157,19 +149,19 @@ class GeminiLiveAPI {
mime_type: mime_type,
data: data,
},
]
],
},
};
this.sendMessage(message)
this.sendMessage(message);
}

sendAudioMessage(base64PCM) {
this.sendRealtimeInputMessage(base64PCM, "audio/pcm")
this.sendRealtimeInputMessage(base64PCM, "audio/pcm");
}

sendImageMessage(base64Image, mime_type = "image/jpeg") {
this.sendRealtimeInputMessage(base64Image, mime_type)
this.sendRealtimeInputMessage(base64Image, mime_type);
}
}

console.log("loaded gemini-live-api.js")
console.log("loaded gemini-live-api.js");
Loading

0 comments on commit e530b5f

Please sign in to comment.