feat: Pure Python audio chat app with Multimodal Live API #1551

freddyaboulton · 2024-12-17T20:47:35Z

Description

The multimodal-live-app requires knowledge of html and js.

Adding a pure-python webapp built with Gradio that supports audio input/output streaming will help developers get started building with the multimodal api without leaving their preferred language.

Follow the CONTRIBUTING Guide.
You are listed as the author in your notebook or README file.
- Your account is listed in CODEOWNERS for the file(s).
Make your Pull Request title in the https://www.conventionalcommits.org/ specification.
Ensure the tests and linter pass (Run nox -s format from the repository root to format).
Appropriate docs were updated (if necessary)

code-review-assist

Hi @freddyaboulton and reviewers,

Gemini is currently reviewing this pull request; a full review should follow in a few minutes. In the meantime, here's a quick summary to help everyone get up to speed:

This PR adds a pure-Python web application built with Gradio that enables audio input/output streaming for the multimodal API. The goal is to provide a more accessible entry point for developers who prefer Python, eliminating the need for HTML and JavaScript knowledge to get started with the multimodal API. The main changes involve creating a new app.py file (gemini/gradio-voice/app.py) which contains a complete Gradio application for voice interaction with the Gemini API. The requirements.txt file (gemini/gradio-voice/requirements.txt) has also been updated to include the necessary libraries: gradio_webrtc, librosa, and python-dotenv.

The app.py file implements a WebSocket connection to the Gemini API, handles audio encoding and decoding, and manages the real-time audio streaming between the user and the Gemini model. The Gradio interface provides a simple user experience for interacting with the voice chat functionality. The API key is securely handled via a password-protected textbox.

If there's anything unclear or missing from this summary, please let me know. I'll provide a more detailed review shortly.

And now, a little haiku to lighten the mood:

Python's gentle grace,
Streams of audio, flowing free,
Gemini speaks back.

code-review-assist

This pull request introduces a Python-based Gradio web application that facilitates real-time voice chat with Gemini. This is a great addition, as it allows developers to interact with Gemini using a familiar language and a user-friendly interface. The code is generally well-structured, but there are some areas where improvements can be made to enhance robustness, error handling, and user experience.

code-review-assist · 2024-12-17T20:53:42Z

gemini/gradio-voice/app.py

+        assert self.config, "Config not set"
+        try:
+            self.ws = websockets.sync.client.connect(self.config.ws_url, timeout=30)
+            initial_request = {
+                "setup": {
+                    "model": self.config.model,
+                }
+            }
+            self.ws.send(json.dumps(initial_request))
+            setup_response = json.loads(self.ws.recv())
+            print(f"Setup response: {setup_response}")
+        except websockets.exceptions.WebSocketException as e:
+            print(f"WebSocket connection failed: {str(e)}")
+            self.ws = None
+        except Exception as e:
+            print(f"Setup failed: {str(e)}")
+            self.ws = None


The error handling in _initialize_websocket could be improved. Currently, both WebSocketException and generic exceptions are caught and printed, but the function doesn't return any indication of failure. This can make it difficult for calling functions to handle connection errors appropriately. Consider raising the caught exceptions after printing the error message, or returning an error status. Additionally, it's a good practice to log the exception details for debugging purposes. How would you modify the code to propagate or handle these errors more effectively?

Suggested change

assert self.config, "Config not set"

try:

self.ws = websockets.sync.client.connect(self.config.ws_url, timeout=30)

initial_request = {

"setup": {

"model": self.config.model,

}

}

self.ws.send(json.dumps(initial_request))

setup_response = json.loads(self.ws.recv())

print(f"Setup response: {setup_response}")

except websockets.exceptions.WebSocketException as e:

print(f"WebSocket connection failed: {str(e)}")

self.ws = None

except Exception as e:

print(f"Setup failed: {str(e)}")

self.ws = None

def _initialize_websocket(self):

assert self.config, "Config not set"

try:

self.ws = websockets.sync.client.connect(self.config.ws_url, timeout=30)

initial_request = {

"setup": {

"model": self.config.model,

}

}

self.ws.send(json.dumps(initial_request))

setup_response = json.loads(self.ws.recv())

print(f"Setup response: {setup_response}")

return setup_response

except websockets.exceptions.WebSocketException as e:

print(f"WebSocket connection failed: {str(e)}")

raise

except Exception as e:

print(f"Setup failed: {str(e)}")

raise

code-review-assist · 2024-12-17T20:53:42Z

gemini/gradio-voice/app.py

+            if not self.ws:
+                self._initialize_websocket()


The _initialize_websocket function now returns the setup response, or raises an exception if the connection or setup fails. It's important to handle potential errors when calling this function. How would you handle a ConnectionError or ValueError raised by _initialize_websocket?

code-review-assist · 2024-12-17T20:53:42Z

gemini/gradio-voice/app.py

+                continue
+
+            try:
+                message = self.ws.recv(timeout=5)


The ws.recv() call could potentially block indefinitely if the server doesn't send a message. Consider adding a timeout to prevent this. How would you handle a timeout error?

code-review-assist · 2024-12-17T20:53:42Z

gemini/gradio-voice/app.py

+
+            try:
+                message = self.ws.recv(timeout=5)
+                msg = json.loads(message)


The json.loads() function can raise a JSONDecodeError if the received message is not valid JSON. It's good practice to handle this exception to prevent unexpected crashes. How would you handle a JSONDecodeError?

code-review-assist · 2024-12-17T20:53:42Z

gemini/gradio-voice/requirements.txt

@@ -0,0 +1,3 @@
+gradio_webrtc==0.0.23


Pinning the gradio_webrtc version is a good practice for reproducibility. However, it's generally recommended to specify a version range rather than a single version to allow for bug fixes and minor updates. Consider using a compatible version range like gradio-webrtc>=0.0.23,<0.1.0 to allow for updates while avoiding potentially breaking changes.

Suggested change

gradio_webrtc==0.0.23

gradio_webrtc>=0.0.23,<0.1.0

holtskinner

Please also resolve spelling errors and respond to Gemini review comments. Deferring to Zack/Katie for further review

holtskinner · 2024-12-18T15:51:51Z

gemini/gradio-voice/app.py

+    def _initialize_websocket(self) -> None:
+        assert self.config, "Config not set"
+        try:
+            self.ws = websockets.sync.client.connect(self.config.ws_url, timeout=30)
+            initial_request = {
+                "setup": {
+                    "model": self.config.model,
+                }
+            }
+            self.ws.send(json.dumps(initial_request))
+            setup_response = json.loads(self.ws.recv())
+            print(f"Setup response: {setup_response}")
+        except websockets.exceptions.WebSocketException as e:
+            print(f"WebSocket connection failed: {str(e)}")
+            self.ws = None


Can you try using the genai.live.start_stream method instead of raw websockets calls?

https://googleapis.github.io/python-genai/genai.html#module-genai.live

freddyaboulton · 2024-12-19T16:31:42Z

Thank you @holtskinner ! I will address the Gemini comments today/tomorrow. I think there is something wrong with the spellcheck as WebRTC is a valid term. How can I configure it to ignore that phrase?

code-review-assist · 2024-12-19T16:32:13Z

Hi @freddyaboulton, Regarding your question about the spellcheck flagging WebRTC as an error, that's a valid point. Many spellcheckers don't recognize technical terms. The best way to handle this depends on your spellchecking tool. If it's a built-in IDE feature, look for options to add custom dictionaries or ignore specific words/phrases. If it's a separate tool like aspell or hunspell, you can usually add WebRTC to a custom dictionary file. For example, with aspell, you might create a personal dictionary file (e.g., .aspell.en.pws) and add WebRTC to it. The exact method will vary based on your setup. Let me know if you need help with the specific tool you are using.

freddyaboulton · 2024-12-19T21:46:46Z

Hi @holtskinner @katiemn @ZackAkil - I switched the demo to use start_stream! However, I ran into a couple of issues I had to solve manually in my locally installed version of the genai package. See here: googleapis/python-genai#35

katiemn · 2024-12-20T02:20:05Z

gemini/gradio-voice/app.py

+        client = genai.Client(api_key=api_key)
+        config = {"response_modalities": ["AUDIO"]}
+        async with client.aio.live.connect(
+            model="models/gemini-2.0-flash-exp", config=config


It looks like this demo connects to Gemini via Google AI Studio. This repository is for demos and code samples for Vertex AI. Please check out this repo to contribute Google AI Studio samples, https://github.com/google-gemini/cookbook/

Thanks for pointing that out, Katie! This PR is indeed using the Gemini API, and it's a valid point that this repo is focused on Vertex AI. I'll work with freddyaboulton to determine the best next steps for this demo. It might be best suited for the Google Gemini cookbook as you suggested.

Alternatively, if you can switch this demo to use Vertex AI instead, we're happy to host it in this repo.

Hi @holtskinner @katiemn - just switched over to use Vertex AI!

2024-12-20.13-39-58.mp4

voice chat

8c7cd11

freddyaboulton requested a review from a team as a code owner December 17, 2024 20:47

code-review-assist bot reviewed Dec 17, 2024

View reviewed changes

freddyaboulton changed the title ~~Add a~~ Add a pure-python audio chat app to help python developers get started Dec 17, 2024

code-review-assist bot reviewed Dec 17, 2024

View reviewed changes

format

746f876

holtskinner requested changes Dec 18, 2024

View reviewed changes

holtskinner assigned ZackAkil and katiemn Dec 18, 2024

holtskinner requested review from ZackAkil and katiemn December 18, 2024 15:52

holtskinner changed the title ~~Add a pure-python audio chat app to help python developers get started~~ feat: Pure Python audio chat app with Multimodal Live API Dec 18, 2024

holtskinner and others added 2 commits December 18, 2024 09:53

Merge branch 'main' into main

7c9f4aa

Formatting

dcc503f

Add code

d5138ec

katiemn requested changes Dec 20, 2024

View reviewed changes

holtskinner and others added 5 commits December 20, 2024 10:04

Merge branch 'main' into main

a342051

add code

9fa5e8c

Fix spelling

9b16d37

requirements

5b43fce

add code

7c6d20f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Pure Python audio chat app with Multimodal Live API #1551

feat: Pure Python audio chat app with Multimodal Live API #1551

freddyaboulton commented Dec 17, 2024

code-review-assist bot left a comment

code-review-assist bot left a comment

code-review-assist bot Dec 17, 2024

code-review-assist bot Dec 17, 2024

code-review-assist bot Dec 17, 2024

code-review-assist bot Dec 17, 2024

code-review-assist bot Dec 17, 2024

holtskinner left a comment

holtskinner Dec 18, 2024

freddyaboulton commented Dec 19, 2024

code-review-assist bot commented Dec 19, 2024

freddyaboulton commented Dec 19, 2024

katiemn Dec 20, 2024

code-review-assist bot Dec 20, 2024

holtskinner Dec 20, 2024

freddyaboulton Dec 20, 2024

feat: Pure Python audio chat app with Multimodal Live API #1551

Are you sure you want to change the base?

feat: Pure Python audio chat app with Multimodal Live API #1551

Conversation

freddyaboulton commented Dec 17, 2024

Description

code-review-assist bot left a comment

Choose a reason for hiding this comment

code-review-assist bot left a comment

Choose a reason for hiding this comment

code-review-assist bot Dec 17, 2024

Choose a reason for hiding this comment

code-review-assist bot Dec 17, 2024

Choose a reason for hiding this comment

code-review-assist bot Dec 17, 2024

Choose a reason for hiding this comment

code-review-assist bot Dec 17, 2024

Choose a reason for hiding this comment

code-review-assist bot Dec 17, 2024

Choose a reason for hiding this comment

holtskinner left a comment

Choose a reason for hiding this comment

holtskinner Dec 18, 2024

Choose a reason for hiding this comment

freddyaboulton commented Dec 19, 2024

code-review-assist bot commented Dec 19, 2024

freddyaboulton commented Dec 19, 2024

katiemn Dec 20, 2024

Choose a reason for hiding this comment

code-review-assist bot Dec 20, 2024

Choose a reason for hiding this comment

holtskinner Dec 20, 2024

Choose a reason for hiding this comment

freddyaboulton Dec 20, 2024

Choose a reason for hiding this comment