The Live API enables low-latency bidirectional voice and video interactions with Gemini. Using the Live API, you can provide end users with the experience of natural, human-like voice conversations, and with the ability to interrupt the model's responses using voice commands. The Live API can process text, audio, and video input, and it can provide text and audio output.
Specifications
The Live API features the following technical specifications:
- Inputs: Text, audio, and video
- Outputs: Text and audio (synthesized speech)
- Default session length: 10 minutes
- Session length can be extended in 10 minute increments as needed
- Context window: 32K tokens
- Selection between 8 voices for responses
- Support for responses in 31 languages
Use the Live API
The following sections provide examples on how to use the Live API's features.
For more information, see the Live API reference guide.
Send and receive text
Gen AI SDK for Python
# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values # with appropriate values for your project. from google import genai from google.genai.types import ( Content, LiveConnectConfig, Modality, Part, ) client = genai.Client( vertexai=True, project=GOOGLE_CLOUD_PROJECT, location=GOOGLE_CLOUD_LOCATION ) MODEL_ID = "gemini-2.0-flash-live-preview-04-09" async with client.aio.live.connect( model=MODEL_ID, config=LiveConnectConfig(response_modalities=[Modality.TEXT]), ) as session: text_input = "Hello? Gemini, are you there?" print("> ", text_input, "\n") await session.send_client_content( turns=Content(role="user", parts=[Part(text=text_input)]) ) response = [] async for message in session.receive(): if message.text: response.append(message.text) print("".join(response)) # Example output: # > Hello? Gemini, are you there? # Yes, I'm here. What would you like to talk about?
Send text and receive audio
Gen AI SDK for Python
import asyncio import numpy as np from IPython.display import Audio, Markdown, display from google import genai from google.genai.types import ( Content, LiveConnectConfig, HttpOptions, Modality, Part, SpeechConfig, VoiceConfig, PrebuiltVoiceConfig, ) client = genai.Client( vertexai=True, project=GOOGLE_CLOUD_PROJECT, location=GOOGLE_CLOUD_LOCATION, ) voice_name = "Aoede" # @param ["Aoede", "Puck", "Charon", "Kore", "Fenrir", "Leda", "Orus", "Zephyr"] config = LiveConnectConfig( response_modalities=["AUDIO"], speech_config=SpeechConfig( voice_config=VoiceConfig( prebuilt_voice_config=PrebuiltVoiceConfig( voice_name=voice_name, ) ), ), ) async with client.aio.live.connect( model=MODEL_ID, config=config, ) as session: text_input = "Hello? Gemini are you there?" display(Markdown(f"**Input:** {text_input}")) await session.send_client_content( turns=Content(role="user", parts=[Part(text=text_input)])) audio_data = [] async for message in session.receive(): if ( message.server_content.model_turn and message.server_content.model_turn.parts ): for part in message.server_content.model_turn.parts: if part.inline_data: audio_data.append( np.frombuffer(part.inline_data.data, dtype=np.int16) ) if audio_data: display(Audio(np.concatenate(audio_data), rate=24000, autoplay=True))
For more examples of sending text, see our Getting Started guide.
Send audio
You can send audio and receive text by converting it to a 16-bit PCM, 16kHz, mono format. This example reads a WAV file and sends it in the correct format:
Gen AI SDK for Python
# Test file: https://storage.googleapis.com/generativeai-downloads/data/16000.wav # Install helpers for converting files: pip install librosa soundfile import asyncio import io from pathlib import Path from google import genai from google.genai import types import soundfile as sf import librosa client = genai.Client( vertexai=True, project=GOOGLE_CLOUD_PROJECT, location=GOOGLE_CLOUD_LOCATION ) model = "gemini-2.0-flash-live-preview-04-09" config = {"response_modalities": ["TEXT"]} async def main(): async with client.aio.live.connect(model=model, config=config) as session: buffer = io.BytesIO() y, sr = librosa.load("sample.wav", sr=16000) sf.write(buffer, y, sr, format="RAW", subtype="PCM_16") buffer.seek(0) audio_bytes = buffer.read() # If already in correct format, you can use this: # audio_bytes = Path("sample.pcm").read_bytes() await session.send_realtime_input( audio=types.Blob(data=audio_bytes, mime_type="audio/pcm;rate=16000") ) async for response in session.receive(): if response.text is not None: print(response.text) if __name__ == "__main__": asyncio.run(main())
Supported audio formats
The Live API supports the following audio formats:
- Input audio format: Raw 16 bit PCM audio at 16kHz little-endian
- Output audio format: Raw 16 bit PCM audio at 24kHz little-endian
Audio transcription
The Live API can transcribe both input and output audio:
Gen AI SDK for Python
import asyncio from google import genai from google.genai import types client = genai.Client( vertexai=True, project=GOOGLE_CLOUD_PROJECT, location=GOOGLE_CLOUD_LOCATION, ) model = "gemini-2.0-flash-live-preview-04-09" config = { "response_modalities": ["AUDIO"], "input_audio_transcription": {}, "output_audio_transcription": {} } async def main(): async with client.aio.live.connect(model=model, config=config) as session: message = "Hello? Gemini are you there?" await session.send_client_content( turns={"role": "user", "parts": [{"text": message}]}, turn_complete=True ) async for response in session.receive(): if response.server_content.model_turn: print("Model turn:", response.server_content.model_turn) if response.server_content.input_transcription: print("Input transcript:", response.server_content.input_transcription.text) if response.server_content.output_transcription: print("Output transcript:", response.server_content.output_transcription.text) if __name__ == "__main__": asyncio.run(main())
Change voice and language settings
The Live API uses Chirp 3 to support synthesized speech responses in a variety of HD voices and languages. For a full list and demos of what each voice sounds like, see Chirp 3: HD voices.
To set the response voice and language:
Gen AI SDK for Python
config = LiveConnectConfig( response_modalities=["AUDIO"], speech_config=SpeechConfig( voice_config=VoiceConfig( prebuilt_voice_config=PrebuiltVoiceConfig( voice_name=voice_name, ) ), language_code="en-US", ), )
Console
- Open Vertex AI Studio > Stream realtime.
- In the Outputs expander, select a voice from the Voice drop-down.
- In the same expander, select a language from the Language drop-down.
- Click Start session to start the session.
For the best results when prompting and requiring the model to respond in a non-English language, include the following as part of your system instructions:
RESPOND IN LANGUAGE. YOU MUST RESPOND UNMISTAKABLY IN LANGUAGE.
Have a streamed conversation
To see an example of how to use the Live API in a streaming audio format, run this example on a local computer with microphone and speaker access (rather than using a Colab notebook).
Gen AI SDK for Python
Set up a conversation with the API that lets you send text prompts and receive audio responses:
""" # Installation # on linux sudo apt-get install portaudio19-dev # on mac brew install portaudio python3 -m venv env source env/bin/activate pip install google-genai """ import asyncio import pyaudio from google import genai from google.genai import types CHUNK=4200 FORMAT=pyaudio.paInt16 CHANNELS=1 RECORD_SECONDS=5 MODEL = 'gemini-2.0-flash-live-preview-04-09' INPUT_RATE=16000 OUTPUT_RATE=24000 client = genai.Client( vertexai=True, project=GOOGLE_CLOUD_PROJECT, location=GOOGLE_CLOUD_LOCATION, ) config = { "response_modalities": ["AUDIO"], "input_audio_transcription": {}, # Configure input transcription "output_audio_transcription": {}, # Configure output transcription } async def main(): print(MODEL) p = pyaudio.PyAudio() async with client.aio.live.connect(model=MODEL, config=config) as session: #exit() async def send(): stream = p.open( format=FORMAT, channels=CHANNELS, rate=INPUT_RATE, input=True, frames_per_buffer=CHUNK) while True: frame = stream.read(CHUNK) await session.send(input={"data": frame, "mime_type": "audio/pcm"}) await asyncio.sleep(10**-12) async def receive(): output_stream = p.open( format=FORMAT, channels=CHANNELS, rate=OUTPUT_RATE, output=True, frames_per_buffer=CHUNK) async for message in session.receive(): if message.server_content.input_transcription: print(message.server_content.model_dump(mode="json", exclude_none=True)) if message.server_content.output_transcription: print(message.server_content.model_dump(mode="json", exclude_none=True)) if message.server_content.model_turn: for part in message.server_content.model_turn.parts: if part.inline_data.data: audio_data=part.inline_data.data output_stream.write(audio_data) await asyncio.sleep(10**-12) send_task = asyncio.create_task(send()) receive_task = asyncio.create_task(receive()) await asyncio.gather(send_task, receive_task) asyncio.run(main())
Console
- Open Vertex AI Studio > Stream realtime.
- Click Start session to start the conversation session.
To end the session, click
Stop session.Session length
The default maximum length of a conversation session is 10 minutes. A go_away
notification (BidiGenerateContentServerMessage.go_away
) will be sent back to
the client 60 seconds before the session ends.
When using the API, you can extend the length of your session by 10 minute increments. There is no limit on how many times you can extend a session. For an example of how to extend your session length, see Enable and disable session resumption. This feature is only available in the API, not in Vertex AI Studio.
Context window
The maximum context length for a session in the Live API is 32,768 tokens by default, which are allocated to store realtime data that is streamed in at a rate of 25 tokens per second (TPS) for audio and 258 TPS for video, and other contents including text based inputs, model outputs, etc.
If the context window exceeds the maximum context length, the contexts of the oldest turns from context window will be truncated, so that the overall context window size is below the limitation.
The default context length of the session, and the target context length after
the truncation, can be configured using
context_window_compression.trigger_tokens
and
context_window_compression.sliding_window.target_tokens
field of the setup
message respectively.
Concurrent sessions
By default, you can have up to 10 concurrent sessions per project.
Update the system instructions mid-session
The Live API lets you update the system instructions in the middle of an active session. You can use this to adapt the model's responses mid-session, such as changing the language the model responds in to another language or modify the tone you want the model to respond with.
Change voice activity detection settings
By default, the model automatically performs voice activity detection (VAD) on a
continuous audio input stream. VAD can be configured with the
realtimeInputConfig.automaticActivityDetection
field of the setup
message.
When the audio stream is paused for more than a second (for example, because the
user switched off the microphone), an
audioStreamEnd
event should be sent to flush any cached audio. The client can resume sending
audio data at any time.
Alternatively, the automatic VAD can be disabled by setting
realtimeInputConfig.automaticActivityDetection.disabled
to true
in the setup
message. In this configuration the client is responsible for detecting user
speech and sending
activityStart
and
activityEnd
messages at the appropriate times. An audioStreamEnd
isn't sent in this
configuration. Instead, any interruption of the stream is marked by an
activityEnd
message.
Enable and disable session resumption
This feature is disabled by default. It must be enabled by the user every time they call the API by specifying the field in the API request, and project-level privacy is enforced for cached data. Enabling Session Resumption allows the user to reconnect to a previous session within 24 hours by storing cached data, including text, video, and audio prompt data and model outputs, for up to 24 hours. To achieve zero data retention, don't enable this feature.
To enable the session resumption feature, set the session_resumption
field of
the LiveConnectConfig
message. If enabled, the server will periodically
take a snapshot of the current cached session contexts, and store it in the
internal storage. When a snapshot is successfully taken, a resumption_update
will be returned with the handle ID that you can record and use later to resume
the session from the snapshot.
Here's an example of enabling session resumption feature, and collect the handle ID information:
Gen AI SDK for Python
import asyncio from google import genai from google.genai import types client = genai.Client( vertexai=True, project=GOOGLE_CLOUD_PROJECT, location=GOOGLE_CLOUD_LOCATION ) model = "gemini-2.0-flash-live-preview-04-09" async def main(): print(f"Connecting to the service with handle {previous_session_handle}...") async with client.aio.live.connect( model=model, config=types.LiveConnectConfig( response_modalities=["AUDIO"], session_resumption=types.SessionResumptionConfig( # The handle of the session to resume is passed here, # or else None to start a new session. handle=previous_session_handle ), ), ) as session: while True: await session.send_client_content( turns=types.Content( role="user", parts=[types.Part(text="Hello world!")] ) ) async for message in session.receive(): # Periodically, the server will send update messages that may # contain a handle for the current state of the session. if message.session_resumption_update: update = message.session_resumption_update if update.resumable and update.new_handle: # The handle should be retained and linked to the session. return update.new_handle # For the purposes of this example, placeholder input is continually fed # to the model. In non-sample code, the model inputs would come from # the user. if message.server_content and message.server_content.turn_complete: break if __name__ == "__main__": asyncio.run(main())
If you want to achieve seamless session resumption, you can enable transparent mode:
Gen AI SDK for Python
types.LiveConnectConfig( response_modalities=["AUDIO"], session_resumption=types.SessionResumptionConfig( transparent=True, ), )
After the transparent mode is enabled, the index of the client message that corresponds with the context snapshot is explicitly returned. This helps identify which client message you need to send again, when you resume the session from the resumption handle.
Use function calling
You can use function calling to create a description of a function, then pass that description to the model in a request. The response from the model includes the name of a function that matches the description and the arguments to call it with.
All functions must be declared at the start of the session by sending tool
definitions as part of the LiveConnectConfig
message.
Gen AI SDK for Python
import asyncio from google import genai from google.genai import types client = genai.Client( vertexai=True, project=GOOGLE_CLOUD_PROJECT, location=GOOGLE_CLOUD_LOCATION, ) model = "gemini-2.0-flash-live-preview-04-09" # Simple function definitions turn_on_the_lights = {"name": "turn_on_the_lights"} turn_off_the_lights = {"name": "turn_off_the_lights"} tools = [{"function_declarations": [turn_on_the_lights, turn_off_the_lights]}] config = {"response_modalities": ["TEXT"], "tools": tools} async def main(): async with client.aio.live.connect(model=model, config=config) as session: prompt = "Turn on the lights please" await session.send_client_content(turns={"parts": [{"text": prompt}]}) async for chunk in session.receive(): if chunk.server_content: if chunk.text is not None: print(chunk.text) elif chunk.tool_call: function_responses = [] for fc in tool_call.function_calls: function_response = types.FunctionResponse( name=fc.name, response={ "result": "ok" } # simple, hard-coded function response ) function_responses.append(function_response) await session.send_tool_response(function_responses=function_responses) if __name__ == "__main__": asyncio.run(main())
Use code execution
You can use code execution with the Live API to generate and execute Python code directly.
Gen AI SDK for Python
import asyncio from google import genai from google.genai import types client = genai.Client( vertexai=True, project=GOOGLE_CLOUD_PROJECT, location=GOOGLE_CLOUD_LOCATION, ) model = "gemini-2.0-flash-live-preview-04-09" tools = [{'code_execution': {}}] config = {"response_modalities": ["TEXT"], "tools": tools} async def main(): async with client.aio.live.connect(model=model, config=config) as session: prompt = "Compute the largest prime palindrome under 100000." await session.send_client_content(turns={"parts": [{"text": prompt}]}) async for chunk in session.receive(): if chunk.server_content: if chunk.text is not None: print(chunk.text) model_turn = chunk.server_content.model_turn if model_turn: for part in model_turn.parts: if part.executable_code is not None: print(part.executable_code.code) if part.code_execution_result is not None: print(part.code_execution_result.output) if __name__ == "__main__": asyncio.run(main())
Use Grounding with Google Search
You can use Grounding with Google
Search with
the Live API using google_search
:
Gen AI SDK for Python
import asyncio from google import genai from google.genai import types client = genai.Client( vertexai=True, project=GOOGLE_CLOUD_PROJECT, location=GOOGLE_CLOUD_LOCATION, ) model = "gemini-2.0-flash-live-preview-04-09" tools = [{'google_search': {}}] config = {"response_modalities": ["TEXT"], "tools": tools} async def main(): async with client.aio.live.connect(model=model, config=config) as session: prompt = "When did the last Brazil vs. Argentina soccer match happen?" await session.send_client_content(turns={"parts": [{"text": prompt}]}) async for chunk in session.receive(): if chunk.server_content: if chunk.text is not None: print(chunk.text) # The model might generate and execute Python code to use Search model_turn = chunk.server_content.model_turn if model_turn: for part in model_turn.parts: if part.executable_code is not None: print(part.executable_code.code) if part.code_execution_result is not None: print(part.code_execution_result.output) if __name__ == "__main__": asyncio.run(main())
Native audio
Gemini 2.5 Flash with Live API features native audio capabilities. In addition to the standard Live API features, native audio includes:
- Enhanced voice quality and adaptability: Live API native audio provides richer, more natural voice interactions with 30 HD voices in 24 languages.
- Introducing Proactive Audio: When Proactive Audio is enabled, the model only responds when it's relevant. The model generates text transcripts and audio responses proactively only for queries directed to the device, and does not respond to non-device directed queries.
Introducing Affective Dialog: Models using Live API native audio can understand and respond appropriately to users' emotional expressions for more nuanced conversations.
Use Proactive Audio
To use Proactive Audio, configure the proactivity
field in
the setup message and set proactive_audio
to true
:
Gen AI SDK for Python
config = LiveConnectConfig( response_modalities=["AUDIO"], proactivity=ProactivityConfig(proactive_audio=True), )
Use Affective Dialog
To use Affective Dialog, set enable_affective_dialog
to
true
in the setup message:
Gen AI SDK for Python
config = LiveConnectConfig( response_modalities=["AUDIO"], enable_affective_dialog=True, )
Limitations
See the Live API limitations section of our reference documentation for the full list of current limitations for the Live API.
The private preview version of Gemini 2.5 Flash with Live API native audio has a limit of 3 concurrent sessions.
Pricing
See our Pricing page for details.
More information
For more information on Live API like the WebSocket
API
reference, see the Gemini API
documentation.