谷歌近日推出Gemini Omni,这是一款能处理视频、图像、音频、文本及草图等多种输入的全能型视频AI模型。用户可通过自然语言指令对已有视频进行添加角色、替换物体、调整动作、改变风格、同步音效及移动镜头等操作,且多次编辑后仍能保持场景一致性。该模型具备更强的世界理解能力,能更真实地模拟重力、流体等物理交互,使视频编辑更接近导演创作。输出内容将附带SynthID水印与C2PA内容凭证,以明确标识其AI生成属性。
Google's new Gemini Omni, can generate "anything from any input"
A video AI model that can create and edit clips from video, images, audio, text, and sketches.
A user can record a normal video, then ask Omni to add a character, replace an object, change the action, alter the style, sync sound, or move the camera through plain language.
Keeps the same scene stable after each edit.
Video models often fail when they must preserve identity, motion, lighting, object position, and cause-and-effect across multiple changes.
Gemini Omni Flash is meant to handle those edits inside the Gemini app, Google Flow, and YouTube Shorts.
Omni has stronger world understanding, meaning it tries to model gravity, fluid motion, kinetic energy, and physical interaction more realistically.
Ovearall, Omni makes AI video feel less like prompt-based generation and more like directing a scene through repeated instructions.
Google is also attaching SynthID watermarking and C2PA Content Credentials to Omni outputs, so edited or generated media can be identified as AI-made.