推出Gemini Omni多模态AI模型
Gemini Omni 把视频生成从画面堆砌推到了物理世界叙事,多轮自然语言编辑和世界知识融合是真正的代际升级,做视频内容的该重新理解工具的定义了。
谷歌推出原生多模态AI模型Gemini Omni,能够整合视频、图像、音频和文本等多种输入,生成高质量视频内容。其核心能力是通过自然语言对话进行视频编辑,并能保持角色一致性、物理规律与场景连贯性。首个模型Gemini Omni Flash已上线,未来将支持图像和音频输出。Gemini Omni结合了对物理世界的直觉理解与丰富的知识库,支持从写实到叙事的创意生成,并可通过多轮对话持续编辑视频,而不丢失原始场景上下文。
Introducing Gemini Omni
Introducing Gemini Omni
10 min read
Gemini Omni Flash is a model that can create anything from any input – starting with video.
Koray Kavukcuoglu CTO, Google DeepMind and Chief AI Architect, Google

Last year, Nano Banana brought Gemini's intelligence to image generation and editing. Since then, it’s helped millions of people restore old photos, design from sketches and visualize ideas in ways that weren’t possible before. From the start we built Gemini to be natively multimodal from the ground up, and now we’re taking the next step.
We’re introducing Gemini Omni, where Gemini’s ability to reason meets the ability to create. Omni is our new model that can create anything from any input — starting with video. With Omni, you can combine images, audio, video and text as input and generate high-quality videos grounded in Gemini's real-world knowledge. You can also easily edit your videos through conversation.
Today, we’re rolling out the first model in the Omni family: Gemini Omni Flash, to the Gemini app, Google Flow and YouTube Shorts. In time we will support output modalities like image and audio. Here’s some of what makes Omni special:
Edit your videos through conversation
Gemini Omni gives you an easier way to edit video — with natural language. Every instruction builds on the last. Your characters stay consistent, the physics hold up and the scene remembers what came before.
Transform the world around you. Change specific things, or change everything. Your video becomes the starting point for something you never could have filmed yourself.
Prompt: Make the sculpture out of bubbles.
Reimagine the action. Take a video you shot and just ask Omni to change what’s happening. Edit the action, add in new characters or objects, or transform a moment into something unexpected.