Google AI 推出 Gemini 3.5 Live Translate,一款面向实时语音到语音翻译的音频模型。该模型支持 70 多种语言,可在用户说话的同时开始翻译并流式输出译文,避免尴尬停顿或断续。模型通过毫秒级决策平衡速度与翻译质量,使对话流畅自然。它可边接收输入边输出翻译语音,延迟仅比说话者慢几秒,并能在长对话中维持语速、音高和语调。目前已在 iOS 和 Android 版 Google Translate 应用中上线。
Today, we released Gemini 3.5 Live Translate, our latest audio model for live speech-to-speech translation.
It supports over 70 languages and starts translating as soon as you start talking, streaming translations while listening to what you say next. No awkward pauses or choppy audio, just real connection without language barriers.
So, how does it work? 🤔
The model is able to make split-second decisions to juggle speed and translation quality so conversations actually feel fluid, human, and natural. In order to do this, the model must receive and contextualize the input while simultaneously outputting the translated speech.
Through this process, Gemini 3.5 Live Translate manages to stay mere seconds behind each speaker and can even maintain pacing, pitch, and intonation across extended sessions.
See it in action below, or try it yourself in the Google Translate app on iOS & Android.