Google推出Gemini Omni,首个面向消费者的世界模型。它通过自然语言交互,将Gemini的智能与生成媒体系统结合,实现了对物理规律、历史、生物等世界的深刻理解。用户可以像编辑ChatGPT文本一样用单句指令编辑视频,实现人物一致性、风格迁移、角度调整等功能。它不是单纯生成像素,而是模拟连贯的物理与语义世界,标志着AI视频生成从拼接工具向智能创作系统的飞跃。
Damn! Google has really gone absolutely wild this time. Gemini Omni is about to blow the roof off the ceiling of video generation 🤯 Making videos used to be like building with Lego blocks, piece by piece, slowly. Now it's giving you a magic Lego factory that can actually think. You chat in natural language, and it understands real-world physics, history, biology, culture-then directly generates or edits any video. Five most mind-blowing abilities that you can use right now: 1Understands real physics-glass marbles colliding, turning, and bouncing in ways that match reality. 2Faces never get distorted-define a character once, put them in any scene, any action. 3Edit videos like you edit ChatGPT text-change backgrounds, swap people, add effects with a single sentence. 4Upload an image and apply any style-make claymation, visualize protein folding, whatever you imagine. 5Video isn't a dead file anymore-change angles, lighting, objects, even storylines just by chatting. This isn't a competitor to Sora. This is the first time a world model has truly entered a consumer-facing product. It's not just generating pixels-it's simulating a coherent physical and semantic world. Open the Gemini app right now and try Omni Flash. Go try it. You'll thank me later.