Live API

The Live API enables low-latency bidirectional voice and video interactions with Gemini. Using the Live API, you can provide end users with the experience of natural, human-like voice conversations, and with the ability to interrupt the model's responses using voice commands. The Live API can process text, audio, and video input, and it can provide text and audio output.

Specifications

The Live API features the following technical specifications:

  • Inputs: Text, audio, and video
  • Outputs: Text and audio (synthesized speech)
  • Default session length: 10 minutes
    • Session length can be extended in 10 minute increments as needed
  • Context window: 32K tokens
  • Selection between 8 voices for responses
  • Support for responses in 31 languages

Use the Live API

The following sections provide examples on how to use the Live API's features.

For more information, see the Live API reference guide.

Send and receive text

Gen AI SDK for Python

    # Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
    # with appropriate values for your project.
    
    from google import genai
    from google.genai.types import (
        Content,
        LiveConnectConfig,
        Modality,
        Part,
    )
    
    client = genai.Client(
        vertexai=True,
        project=GOOGLE_CLOUD_PROJECT,
        location=GOOGLE_CLOUD_LOCATION
    )
    MODEL_ID = "gemini-2.0-flash-live-preview-04-09"
    
    async with client.aio.live.connect(
        model=MODEL_ID,
        config=LiveConnectConfig(response_modalities=[Modality.TEXT]),
    ) as session:
        text_input = "Hello? Gemini, are you there?"
        print("> ", text_input, "\n")
        await session.send_client_content(
            turns=Content(role="user", parts=[Part(text=text_input)])
        )
    
        response = []
    
        async for message in session.receive():
            if message.text:
                response.append(message.text)
    
        print("".join(response))
    # Example output:
    # >  Hello? Gemini, are you there?
    # Yes, I'm here. What would you like to talk about?
  

Send text and receive audio

Gen AI SDK for Python

  import asyncio
  import numpy as np
  from IPython.display import Audio, Markdown, display
  from google import genai
  from google.genai.types import (
      Content,
      LiveConnectConfig,
      HttpOptions,
      Modality,
      Part,
      SpeechConfig,
      VoiceConfig,
      PrebuiltVoiceConfig,
  )

  client = genai.Client(
      vertexai=True,
      project=GOOGLE_CLOUD_PROJECT,
      location=GOOGLE_CLOUD_LOCATION,
  )

  voice_name = "Aoede"  # @param ["Aoede", "Puck", "Charon", "Kore", "Fenrir", "Leda", "Orus", "Zephyr"]

  config = LiveConnectConfig(
      response_modalities=["AUDIO"],
      speech_config=SpeechConfig(
          voice_config=VoiceConfig(
              prebuilt_voice_config=PrebuiltVoiceConfig(
                  voice_name=voice_name,
              )
          ),
      ),
  )

  async with client.aio.live.connect(
      model=MODEL_ID,
      config=config,
  ) as session:
      text_input = "Hello? Gemini are you there?"
      display(Markdown(f"**Input:** {text_input}"))

      await session.send_client_content(
          turns=Content(role="user", parts=[Part(text=text_input)]))

      audio_data = []
      async for message in session.receive():
          if (
              message.server_content.model_turn
              and message.server_content.model_turn.parts
          ):
              for part in message.server_content.model_turn.parts:
                  if part.inline_data:
                      audio_data.append(
                          np.frombuffer(part.inline_data.data, dtype=np.int16)
                      )

      if audio_data:
          display(Audio(np.concatenate(audio_data), rate=24000, autoplay=True))
      

For more examples of sending text, see our Getting Started guide.

Send audio

You can send audio and receive text by converting it to a 16-bit PCM, 16kHz, mono format. This example reads a WAV file and sends it in the correct format:

Gen AI SDK for Python

# Test file: https://storage.googleapis.com/generativeai-downloads/data/16000.wav
# Install helpers for converting files: pip install librosa soundfile

import asyncio
import io
from pathlib import Path
from google import genai
from google.genai import types
import soundfile as sf
import librosa

client = genai.Client(
    vertexai=True,
    project=GOOGLE_CLOUD_PROJECT,
    location=GOOGLE_CLOUD_LOCATION
    )
model = "gemini-2.0-flash-live-preview-04-09"
config = {"response_modalities": ["TEXT"]}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:

        buffer = io.BytesIO()
        y, sr = librosa.load("sample.wav", sr=16000)
        sf.write(buffer, y, sr, format="RAW", subtype="PCM_16")
        buffer.seek(0)
        audio_bytes = buffer.read()

        # If already in correct format, you can use this:
        # audio_bytes = Path("sample.pcm").read_bytes()

        await session.send_realtime_input(
            audio=types.Blob(data=audio_bytes, mime_type="audio/pcm;rate=16000")
        )

        async for response in session.receive():
            if response.text is not None:
                print(response.text)

if __name__ == "__main__":
    asyncio.run(main())
      

Supported audio formats

The Live API supports the following audio formats:

  • Input audio format: Raw 16 bit PCM audio at 16kHz little-endian
  • Output audio format: Raw 16 bit PCM audio at 24kHz little-endian

Audio transcription

The Live API can transcribe both input and output audio:

Gen AI SDK for Python

import asyncio
from google import genai
from google.genai import types

client = genai.Client(
    vertexai=True,
    project=GOOGLE_CLOUD_PROJECT,
    location=GOOGLE_CLOUD_LOCATION,
)
model = "gemini-2.0-flash-live-preview-04-09"

config = {
    "response_modalities": ["AUDIO"],
    "input_audio_transcription": {},
    "output_audio_transcription": {}
}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        message = "Hello? Gemini are you there?"

        await session.send_client_content(
            turns={"role": "user", "parts": [{"text": message}]}, turn_complete=True
        )

        async for response in session.receive():
            if response.server_content.model_turn:
                print("Model turn:", response.server_content.model_turn)
            if response.server_content.input_transcription:
                print("Input transcript:", response.server_content.input_transcription.text)
            if response.server_content.output_transcription:
                print("Output transcript:", response.server_content.output_transcription.text)


if __name__ == "__main__":
    asyncio.run(main())

      

Change voice and language settings

The Live API uses Chirp 3 to support synthesized speech responses in a variety of HD voices and languages. For a full list and demos of what each voice sounds like, see Chirp 3: HD voices.

To set the response voice and language:

Gen AI SDK for Python

config = LiveConnectConfig(
    response_modalities=["AUDIO"],
    speech_config=SpeechConfig(
        voice_config=VoiceConfig(
            prebuilt_voice_config=PrebuiltVoiceConfig(
                voice_name=voice_name,
            )
        ),
        language_code="en-US",
    ),
)
      

Console

  1. Open Vertex AI Studio > Stream realtime.
  2. In the Outputs expander, select a voice from the Voice drop-down.
  3. In the same expander, select a language from the Language drop-down.
  4. Click Start session to start the session.

For the best results when prompting and requiring the model to respond in a non-English language, include the following as part of your system instructions:

RESPOND IN LANGUAGE. YOU MUST RESPOND UNMISTAKABLY IN LANGUAGE.

Have a streamed conversation

To see an example of how to use the Live API in a streaming audio format, run this example on a local computer with microphone and speaker access (rather than using a Colab notebook).

Gen AI SDK for Python

Set up a conversation with the API that lets you send text prompts and receive audio responses:

"""
# Installation
# on linux
sudo apt-get install portaudio19-dev

# on mac
brew install portaudio

python3 -m venv env
source env/bin/activate
pip install google-genai
"""

import asyncio
import pyaudio
from google import genai
from google.genai import types

CHUNK=4200
FORMAT=pyaudio.paInt16
CHANNELS=1
RECORD_SECONDS=5
MODEL = 'gemini-2.0-flash-live-preview-04-09'
INPUT_RATE=16000
OUTPUT_RATE=24000

client = genai.Client(
    vertexai=True,
    project=GOOGLE_CLOUD_PROJECT,
    location=GOOGLE_CLOUD_LOCATION,
)
config = {
    "response_modalities": ["AUDIO"],
    "input_audio_transcription": {},  # Configure input transcription
    "output_audio_transcription": {},  # Configure output transcription
}

async def main():
    print(MODEL)
    p = pyaudio.PyAudio()
    async with client.aio.live.connect(model=MODEL, config=config) as session: 
        #exit()
        async def send():
            stream = p.open(
                format=FORMAT, channels=CHANNELS, rate=INPUT_RATE, input=True, frames_per_buffer=CHUNK)
            while True:
                frame = stream.read(CHUNK)
                await session.send(input={"data": frame, "mime_type": "audio/pcm"})
                await asyncio.sleep(10**-12)
        async def receive():
            output_stream = p.open(
                format=FORMAT, channels=CHANNELS, rate=OUTPUT_RATE, output=True, frames_per_buffer=CHUNK)
            async for message in session.receive():
                if message.server_content.input_transcription:
                  print(message.server_content.model_dump(mode="json", exclude_none=True))
                if message.server_content.output_transcription:
                  print(message.server_content.model_dump(mode="json", exclude_none=True))
                if message.server_content.model_turn:
                    for part in message.server_content.model_turn.parts:
                        if part.inline_data.data:
                            audio_data=part.inline_data.data
                            output_stream.write(audio_data)
                            await asyncio.sleep(10**-12)
        send_task = asyncio.create_task(send())
        receive_task = asyncio.create_task(receive())
        await asyncio.gather(send_task, receive_task)

asyncio.run(main())

      

Console

  1. Open Vertex AI Studio > Stream realtime.
  2. Click Start session to start the conversation session.

To end the session, click Stop session.

Session length

The default maximum length of a conversation session is 10 minutes. A go_away notification (BidiGenerateContentServerMessage.go_away) will be sent back to the client 60 seconds before the session ends.

When using the API, you can extend the length of your session by 10 minute increments. There is no limit on how many times you can extend a session. For an example of how to extend your session length, see Enable and disable session resumption. This feature is only available in the API, not in Vertex AI Studio.

Context window

The maximum context length for a session in the Live API is 32,768 tokens by default, which are allocated to store realtime data that is streamed in at a rate of 25 tokens per second (TPS) for audio and 258 TPS for video, and other contents including text based inputs, model outputs, etc.

If the context window exceeds the maximum context length, the contexts of the oldest turns from context window will be truncated, so that the overall context window size is below the limitation.

The default context length of the session, and the target context length after the truncation, can be configured using context_window_compression.trigger_tokens and context_window_compression.sliding_window.target_tokens field of the setup message respectively.

Concurrent sessions

By default, you can have up to 10 concurrent sessions per project.

Update the system instructions mid-session

The Live API lets you update the system instructions in the middle of an active session. You can use this to adapt the model's responses mid-session, such as changing the language the model responds in to another language or modify the tone you want the model to respond with.

Change voice activity detection settings

By default, the model automatically performs voice activity detection (VAD) on a continuous audio input stream. VAD can be configured with the realtimeInputConfig.automaticActivityDetection field of the setup message.

When the audio stream is paused for more than a second (for example, because the user switched off the microphone), an audioStreamEnd event should be sent to flush any cached audio. The client can resume sending audio data at any time.

Alternatively, the automatic VAD can be disabled by setting realtimeInputConfig.automaticActivityDetection.disabled to true in the setup message. In this configuration the client is responsible for detecting user speech and sending activityStart and activityEnd messages at the appropriate times. An audioStreamEnd isn't sent in this configuration. Instead, any interruption of the stream is marked by an activityEnd message.

Enable and disable session resumption

This feature is disabled by default. It must be enabled by the user every time they call the API by specifying the field in the API request, and project-level privacy is enforced for cached data. Enabling Session Resumption allows the user to reconnect to a previous session within 24 hours by storing cached data, including text, video, and audio prompt data and model outputs, for up to 24 hours. To achieve zero data retention, don't enable this feature.

To enable the session resumption feature, set the session_resumption field of the LiveConnectConfig message. If enabled, the server will periodically take a snapshot of the current cached session contexts, and store it in the internal storage. When a snapshot is successfully taken, a resumption_update will be returned with the handle ID that you can record and use later to resume the session from the snapshot.

Here's an example of enabling session resumption feature, and collect the handle ID information:

Gen AI SDK for Python

import asyncio
from google import genai
from google.genai import types

client = genai.Client(
    vertexai=True,
    project=GOOGLE_CLOUD_PROJECT,
    location=GOOGLE_CLOUD_LOCATION
)
model = "gemini-2.0-flash-live-preview-04-09"

async def main():
    print(f"Connecting to the service with handle {previous_session_handle}...")
    async with client.aio.live.connect(
        model=model,
        config=types.LiveConnectConfig(
            response_modalities=["AUDIO"],
            session_resumption=types.SessionResumptionConfig(
                # The handle of the session to resume is passed here,
                # or else None to start a new session.
                handle=previous_session_handle
            ),
        ),
    ) as session:
        while True:
            await session.send_client_content(
                turns=types.Content(
                    role="user", parts=[types.Part(text="Hello world!")]
                )
            )
            async for message in session.receive():
                # Periodically, the server will send update messages that may
                # contain a handle for the current state of the session.
                if message.session_resumption_update:
                    update = message.session_resumption_update
                    if update.resumable and update.new_handle:
                        # The handle should be retained and linked to the session.
                        return update.new_handle

                # For the purposes of this example, placeholder input is continually fed
                # to the model. In non-sample code, the model inputs would come from
                # the user.
                if message.server_content and message.server_content.turn_complete:
                    break

if __name__ == "__main__":
    asyncio.run(main())
      

If you want to achieve seamless session resumption, you can enable transparent mode:

Gen AI SDK for Python

types.LiveConnectConfig(
            response_modalities=["AUDIO"],
            session_resumption=types.SessionResumptionConfig(
                transparent=True,
    ),
)
      

After the transparent mode is enabled, the index of the client message that corresponds with the context snapshot is explicitly returned. This helps identify which client message you need to send again, when you resume the session from the resumption handle.

Use function calling

You can use function calling to create a description of a function, then pass that description to the model in a request. The response from the model includes the name of a function that matches the description and the arguments to call it with.

All functions must be declared at the start of the session by sending tool definitions as part of the LiveConnectConfig message.

Gen AI SDK for Python

import asyncio
from google import genai
from google.genai import types

client = genai.Client(
    vertexai=True,
    project=GOOGLE_CLOUD_PROJECT,
    location=GOOGLE_CLOUD_LOCATION,
)
model = "gemini-2.0-flash-live-preview-04-09"

# Simple function definitions
turn_on_the_lights = {"name": "turn_on_the_lights"}
turn_off_the_lights = {"name": "turn_off_the_lights"}

tools = [{"function_declarations": [turn_on_the_lights, turn_off_the_lights]}]
config = {"response_modalities": ["TEXT"], "tools": tools}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        prompt = "Turn on the lights please"
        await session.send_client_content(turns={"parts": [{"text": prompt}]})

        async for chunk in session.receive():
            if chunk.server_content:
                if chunk.text is not None:
                    print(chunk.text)
            elif chunk.tool_call:
                function_responses = []
                for fc in tool_call.function_calls:
                    function_response = types.FunctionResponse(
                        name=fc.name,
                        response={ "result": "ok" } # simple, hard-coded function response
                    )
                    function_responses.append(function_response)

                await session.send_tool_response(function_responses=function_responses)


if __name__ == "__main__":
    asyncio.run(main())
  

Use code execution

You can use code execution with the Live API to generate and execute Python code directly.

Gen AI SDK for Python

import asyncio
from google import genai
from google.genai import types


client = genai.Client(
    vertexai=True,
    project=GOOGLE_CLOUD_PROJECT,
    location=GOOGLE_CLOUD_LOCATION,
)
model = "gemini-2.0-flash-live-preview-04-09"

tools = [{'code_execution': {}}]
config = {"response_modalities": ["TEXT"], "tools": tools}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        prompt = "Compute the largest prime palindrome under 100000."
        await session.send_client_content(turns={"parts": [{"text": prompt}]})

        async for chunk in session.receive():
            if chunk.server_content:
                if chunk.text is not None:
                    print(chunk.text)
            
                model_turn = chunk.server_content.model_turn
                if model_turn:
                    for part in model_turn.parts:
                      if part.executable_code is not None:
                        print(part.executable_code.code)

                      if part.code_execution_result is not None:
                        print(part.code_execution_result.output)

if __name__ == "__main__":
    asyncio.run(main())
  

You can use Grounding with Google Search with the Live API using google_search:

Gen AI SDK for Python

import asyncio
from google import genai
from google.genai import types

client = genai.Client(
    vertexai=True,
    project=GOOGLE_CLOUD_PROJECT,
    location=GOOGLE_CLOUD_LOCATION,
)
model = "gemini-2.0-flash-live-preview-04-09"


tools = [{'google_search': {}}]
config = {"response_modalities": ["TEXT"], "tools": tools}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        prompt = "When did the last Brazil vs. Argentina soccer match happen?"
        await session.send_client_content(turns={"parts": [{"text": prompt}]})

        async for chunk in session.receive():
            if chunk.server_content:
                if chunk.text is not None:
                    print(chunk.text)

                # The model might generate and execute Python code to use Search
                model_turn = chunk.server_content.model_turn
                if model_turn:
                    for part in model_turn.parts:
                        if part.executable_code is not None:
                        print(part.executable_code.code)

                        if part.code_execution_result is not None:
                        print(part.code_execution_result.output)

if __name__ == "__main__":
    asyncio.run(main())
  

Native audio

Gemini 2.5 Flash with Live API features native audio capabilities. In addition to the standard Live API features, native audio includes:

  • Enhanced voice quality and adaptability: Live API native audio provides richer, more natural voice interactions with 30 HD voices in 24 languages.
  • Introducing Proactive Audio: When Proactive Audio is enabled, the model only responds when it's relevant. The model generates text transcripts and audio responses proactively only for queries directed to the device, and does not respond to non-device directed queries.
  • Introducing Affective Dialog: Models using Live API native audio can understand and respond appropriately to users' emotional expressions for more nuanced conversations.

Use Proactive Audio

To use Proactive Audio, configure the proactivity field in the setup message and set proactive_audio to true:

Gen AI SDK for Python

config = LiveConnectConfig(
    response_modalities=["AUDIO"],
    proactivity=ProactivityConfig(proactive_audio=True),
)
  

Use Affective Dialog

To use Affective Dialog, set enable_affective_dialog to true in the setup message:

Gen AI SDK for Python

config = LiveConnectConfig(
    response_modalities=["AUDIO"],
    enable_affective_dialog=True,
)
  

Limitations

See the Live API limitations section of our reference documentation for the full list of current limitations for the Live API.

The private preview version of Gemini 2.5 Flash with Live API native audio has a limit of 3 concurrent sessions.

Pricing

See our Pricing page for details.

More information

For more information on Live API like the WebSocket API reference, see the Gemini API documentation.