AIHOT

全部动态X · 612 条

全部一手资讯 X 论文

karminski-牙医@karminski3 · 5月22日66

这一波估计XX词典凉的透透的了... 刚看到这个图坐不住了, 30B-A3B 的小模型跑分碾压 DeepSeek-V4-Pro? 谁给你的勇气? 然后定睛一看, 原来是翻译专用大模型. 腾讯刚刚放出了3个翻译专用大模型, 分别是 Hy-MT2-1.8B, Hy-MT2-7B, Hy-MT2-30B-A3B. 其中 Hy-MT2-30B-A3B 在 DomainMTBench (这是个专门测试特定领域翻译能力的benchmark, 包含金融, 法律, 医疗, 技术等) 测试中全面超越了 DeepSeek-V4-Pro. 直接给大家来一手测试: #hymt2 #翻译大模型

译腾讯推出了三款翻译专用大模型：Hy-MT2-1.8B、Hy-MT2-7B与Hy-MT2-30B-A3B。其中，30B-A3B版本在专攻金融、法律、医疗、技术等特定领域的 DomainMTBench 翻译能力测试中，其表现全面超越了通用大模型 DeepSeek-V4-Pro。这显示了专用模型在垂直领域的显著性能优势。

查看原推 ↗

Berryxia.AI@berryxia · 5月22日60

卧槽，兄弟们你敢信？现在可以本地Mac电脑就可以跑音乐模型！这一刻苹果的本地的统一内存架构又发挥了它的优势，早买早享受😎 Stable Audio 3 官方版刚出，直接整了个狠货： 59x realtime 在 M5 Pro 上跑，MacBook Pro 直接起飞。最离谱的是： - LoRA 微调不到 1 小时就能搞定 - Sm 模式更快，Medium 模式更高质量 - 一行命令直接装（MLX 优化版） curl -LsSf https://raw.githubusercontent.com/Stability-AI/stable-audio-3/main/optimized/mlx/bootstrap.sh | bash 这已经不是“云端试试水”了，而是真正在本地就能高强度玩音乐生成的工具。想快速出 demo？想自己训风格？想在飞机上继续作曲？现在基本都能做了。而且他们直接说“break it plz”，明显是想让社区狠狠折腾。整起来～

译Stable Audio 3官方正式发布，并提供了针对苹果MLX框架的优化版本，使得强大的音乐生成模型能够直接在本地Mac电脑上高效运行。其核心亮点在于在M5 Pro芯片上可实现59倍实时生成速度，性能表现突出。此外，该工具支持在不到1小时内完成LoRA微调，并提供快速（Sm）与高质量（Medium）两种生成模式。开发者鼓励社区积极探索其潜力，标志着本地化音乐创作工具达到了新高度。

查看原推 ↗

Runway@runwayml · 5月22日84

Aleph 2.0 is here. Now you can edit a single frame in your video, preview the change and then Aleph 2.0 carries that edit across the rest of your video. Try it now in the new Edit Studio on web at the link below.

译Aleph 2.0来了。现在你可以编辑视频中的单个帧，预览更改，然后Aleph 2.0会将该编辑应用到整个视频。立即在下方链接的网页版新编辑工作室中尝试。

查看原推 ↗

Alibaba Cloud@alibaba_cloud · 5月22日75

Qwen3.7-Max is live on @OpenRouter https://x.com/OpenRouter/status/2057500097206976983?s=20

译Qwen3.7-Max已在@OpenRouter上线 https://x.com/OpenRouter/status/2057500097206976983?s=20

查看原推 ↗

Rohan Paul@rohanpaul_ai · 5月22日84

Alibaba just released Qwen3.7-Max. Their best flagship model built for real-world tasks and production environments. - Agent reliability the center of the story, where the model must plan steps, call tools, inspect results, fix mistakes, and continue without collapsing after the first wrong turn. - 56.6 on the Artificial Analysis Intelligence Index, up 4.8 points from Qwen3.6-Max. Qwen 3.7 Max sitting at 5th, pretty much on par with GPT 5.4 (xhigh) - The Intelligence Index gains over Qwen3.6 Max Preview are concentrated in scientific reasoning, agentic capability and coding. - One important layer of the serving stack, the inference kernel, was optimized heavily. from near-baseline speed to 10.0x geometric mean speedup after many rounds of low-level GPU optimization.

译阿里巴巴正式推出最新旗舰模型Qwen3.7-Max，定位为Agent时代的生产级基础模型。该模型在权威评测中得分56.6，较前代显著提升，性能与GPT-5.4相当。其核心优势在于卓越的Agent可靠性，能够在复杂任务中自主规划、调用工具、纠错并持续执行。通过底层深度优化，模型实现了10倍推理加速，并支持长达数小时的自主运行与多工具协作。该模型现已上线阿里云模型工作室，并兼容Claude Code、OpenClaw等主流开发框架，助力开发者构建实际应用。

查看原推 ↗

OpenRouter@OpenRouter · 5月22日78

The new Qwen3.7-Max from @Alibaba_Qwen is live on OpenRouter. The flagship of the Qwen3.7 series, built for agent-centric work: coding, office and productivity tasks, and long-horizon autonomous execution. Big jumps in coding and agent benchmarks over Qwen3.6, with explicit prompt caching for repeated context.

译阿里巴巴通义千问团队的全新Qwen3.7-Max现已登陆OpenRouter。作为Qwen3.7系列的旗舰模型，专为以智能体为核心的工作场景打造：编程、办公与生产力任务，以及长周期自主执行。在编程和智能体基准测试中较Qwen3.6有显著提升，并支持显式提示缓存以处理重复上下文。

查看原推 ↗

Alibaba Cloud@alibaba_cloud · 5月21日76

Qwen3.7-Max just landed at 56.6 on the Artificial Analysis Intelligence Index — a solid 4.8pt jump over Qwen3.6-Max-Preview. @ArtificialAnlys Sharper sci reasoning, stronger agentic chops, better coding, and it hallucinates less.

译阿里巴巴推出其最新闭源旗舰大模型Qwen3.7 Max，在Artificial Analysis智能指数上获得56.6分，较上代预览版提升4.8分，是其迄今最接近国际顶尖水平的模型。此次分数提升主要得益于科学推理、代理和编码能力的增强，其中模型的幻觉率大幅降低（从44.2%降至22.9%）是主要贡献因素。模型的上下文窗口已扩展至100万tokens，仍仅支持文本输入输出，具体定价尚未公布。

查看原推 ↗

Chubby♨️@kimmonismus · 5月21日66

Alibaba released Qwen 3.7 max. Benchmarks incredible. Their new model ran autonomously for 35 hours, made 1,158 tool calls, and achieved a 10x speedup - on a single attention kernel. This isn't "AI improving itself across the board." It's a model grinding through compile-profile-rewrite loops on one well-defined optimization target. Impressive? Absolutely. The kind of self-improvement people will imagine when they see the headline? Not yet. The actually interesting claim is buried deeper: Qwen says agentic capabilities generalize from diverse training environments the same way language capabilities generalize from diverse text. If that holds, it's a bigger deal than any benchmark number.

译阿里云发布新旗舰模型Qwen3.7 Max，定位为“代理时代”的基础模型，强调其在端到端编码、办公自动化等实际任务中的执行能力。模型在一个内核优化任务中展示了35小时无人干预的自主运行能力，完成了超过1000次工具调用。但这并非模型的全面自我进化，而是针对特定优化目标的迭代改进。更值得关注的是，Qwen声称其代理能力能从多样化的训练环境中泛化，如同语言能力从文本中泛化。这一观点若成立，其意义将远超任何基准测试成绩。

查看原推 ↗

Alibaba Cloud@alibaba_cloud · 5月21日85

(1/6) 📣Meet Qwen3.7-Max — our latest flagship, made for the Agent Era. A versatile foundation for agents that actually get things done: 🧑‍💻 Coding agent, end-to-end. Frontend prototypes, multi-file refactors, real debugging — nails it. 🗂️ A reliable office and productivity assistant. Get your work done through MCP integrations and multi-agent orchestration. ⏱️ Long-horizon autonomy. 35 hours straight on a kernel optimization task — 1,000+ tool calls, zero hand-holding. 🔌 Scaffold-agnostic. Claude Code, OpenClaw, Qwen Code, or your own stack. Consistent reliability everywhere. API's up on Model Studio: https://int.alibabacloud.com/m/1000413187/ Go build something wild!

译阿里云发布了通义千问系列的新旗舰模型Qwen3.7-Max，定位为面向智能体时代的通用基础模型。该模型旨在为“能真正完成任务”的智能体提供强大支撑，其核心能力包括：支持端到端的复杂编码任务，可作为集成多智能体协作的办公助手，并能执行超过35小时的长期自主任务。该模型具有框架无关的兼容性，可适配Claude Code、OpenClaw等多种工具链。目前，用户已可通过Model Studio平台调用其API。

查看原推 ↗

Qwen@Alibaba_Qwen · 5月21日82

📣Meet Qwen3.7-Max — our latest flagship, made for the Agent Era. A versatile foundation for agents that actually get things done: 🧑‍💻 Coding agent, end to end. Frontend prototypes, multi-file refactors, real debugging — nails it. 🗂️ A reliable office and productivity assistant. Get your work done through MCP integrations and multi-agent orchestration. ⏱️ Long-horizon autonomy. 35 hours straight on a kernel optimization task — 1,000+ tool calls, zero hand-holding. 🔌 Scaffold-agnostic. Claude Code, OpenClaw, Qwen Code, or your own stack. Consistent reliability everywhere. API's up on Alibaba Model Studio. You can also take it for a spin on Qwen Studio. Go build something wild!🏃🏃‍♂️ 📖 Blog: https://qwen.ai/blog?id=qwen3.7 ✅ Qwen Studio: https://chat.qwen.ai/?models=qwen3.7-max ⚡️ API：https://modelstudio.console.alibabacloud.com/ap-southeast-1?tab=doc#/doc/?type=model&url=2840914_2&modelId=qwen3.7-max&serviceSite=international

译Qwen3.7-Max是Qwen系列面向Agent时代推出的最新旗舰模型，旨在为能完成实际任务的智能体提供强大基础。其核心能力包括：可作为端到端编码智能体，处理前端原型与多文件重构；作为可靠的办公助手，通过MCP集成与多智能体编排协同工作；并支持超长时间（超过35小时）的自主运行，执行复杂任务链。该模型兼容Claude Code、OpenClaw等主流开发框架，现已上线阿里云模型工作室与Qwen Studio提供服务。

查看原推 ↗

Qwen@Alibaba_Qwen · 5月21日76

🚀Qwen3.7-Max just landed at 56.6 on the Artificial Analysis Intelligence Index — a solid 4.8pt jump over Qwen3.6-Max-Preview. @ArtificialAnlys ⚡️Sharper sci reasoning, stronger agentic chops, better coding, and it hallucinates less.

译阿里巴巴近期推出了新一代闭源旗舰模型Qwen3.7 Max。该模型在Artificial Analysis智能指数上获得56.6分，较前代Qwen3.6 Max Preview提升了4.8分，创下阿里系模型最接近全球前沿水平的记录。此次升级主要体现在科学推理、智能体能力和代码生成方面，同时显著降低了模型幻觉率。值得注意的是，其分数提升部分源于模型更倾向于拒绝回答，而非完全依靠事实准确率的提高。技术上，其上下文窗口已扩大至100万tokens，仍保持闭源权重。尽管如此，该模型在整体能力上仍落后于OpenAI、Anthropic和Google的同类产品。

查看原推 ↗

🚨 AI News | TestingCatalog@testingcatalog · 5月21日72

Alibaba released Qwen 3.7 Max, its latest proprietary model for agentic coding. Qwen 3.7 Max scores 56.6 on the Artificial Analysis Intelligence Index, outperforming recently released Gemini 3.5 Flash and Kimi K2.6.

译阿里巴巴发布了其最新的专有模型 Qwen 3.7 Max，专为智能体编码设计。 Qwen 3.7 Max 在人工智能分析智能指数上获得 56.6 分，超越了近期发布的 Gemini 3.5 Flash 和 Kimi K2.6。

查看原推 ↗

Tencent Hy@TencentHunyuan · 5月21日74

🚀 Introduce Hy-MT2: New Open-Source Multilingual Translation Model We proudly launch our new Hy-MT2 translation model and the Tencent Hy Translation mini-program! Hy-MT2 is a powerful multilingual model supporting seamless translation across 33 languages — and it's fully open-source! It's 7B and 30B-A3B models achieve state-of-the-art performance among all open-source models on various translation tasks, surpassing models with dozens of times more parameters. The lightweight 1.8B model even outperforms mainstream commercial APIs like Microsoft and so on. Powered by Tencent AngelSlim 1.25-bit extreme quantization, it needs just 440MB storage and enables effortless local inference on mainstream mobile chips — with 1.5x faster speed vs. Hy-MT1.5. Open-source AI translation just got way smarter, faster, and more accessible! 🌏 Project Page: https://aistudio.tencent.com/llm/zh?tabIndex=0 Hugging Face: https://huggingface.co/collections/tencent/hy-mt2 Modelscope: https://modelscope.cn/collections/Tencent-Hunyuan/Hy-MT2 Github: https://github.com/Tencent-Hunyuan/Hy-MT2

译腾讯正式开源Hy-MT2多语言翻译模型，支持33种语言间的无缝互译。其7B与30B-A3B版本在开源模型中达到最先进的翻译性能，超越了许多参数规模大数十倍的模型。更具突破性的是，1.8B轻量级版本性能超越微软等主流商业API，并凭借腾讯AngelSlim 1.25-bit极量化技术，仅需440MB存储空间，即可在主流手机芯片上本地运行，推理速度较前代提升1.5倍，显著降低了高质量AI翻译的部署门槛。

查看原推 ↗

Artificial Analysis@ArtificialAnlys · 5月21日70

Alibaba’s new Qwen3.7 Max model scores 56.6 on the Artificial Analysis Intelligence Index, 4.8 points higher than Qwen3.6 Max Preview (51.8). While Alibaba still trails models from OpenAI, Anthropic and Google, Qwen3.7 Max is the closest they have been to the frontier Qwen3.7 Max is @Alibaba_Qwen's latest proprietary flagship, scoring 56.6 on the Intelligence Index, a 4.8 point gain over Qwen3.6 Max Preview (51.8) released in April. Qwen3.7 Max continues Alibaba's pattern, in place since Qwen2.5 Max (January 2025), of releasing Max and Plus models as closed weights while the rest of the Qwen line remains open weights. The leading open weights Qwen on the Intelligence Index is Qwen3.6 27B (Reasoning, 45.8) released in April 2026, and the leading open weights MoE Qwen is Qwen3.5 397B A17B (Reasoning, 45.0) released in February 2026 Key takeaways for the reasoning variant: ➤ The Intelligence Index gains over Qwen3.6 Max Preview are concentrated in scientific reasoning, agentic capability and coding. CritPt +9.7 p.p (3.7% to 13.4%), HLE +9.2 p.p (28.9% to 38.1%), TerminalBench Hard +6.9 p.p (43.9% to 50.8%) and GDPval-AA +42 Elo (1504 to 1546). Scores on other benchmarks in the Intelligence Index are flat compared to Qwen3.6 Max Preview ➤ A significant share of the Intelligence Index gain is driven by higher abstention on AA-Omniscience, not higher accuracy. Qwen3.7 Max's accuracy on AA-Omniscience dropped 7.6 p.p (37.7% to 30.1%), while its hallucination rate dropped 21.3 p.p (44.2% to 22.9%). The model is choosing not to answer more questions rather than recalling more facts. Because hallucination rate and accuracy both feed into the Intelligence Index, the hallucination reduction is one of the larger single contributors to the +4.8 point gain on the Intelligence Index ➤ Qwen3.7 Max used 96.7M output tokens to run the Intelligence Index, ~31% more than Qwen3.6 Max Preview (73.9M). It sits mid-pack on frontier token usage: above GPT-5.5 (high, 44.5M) and Gemini 3.1 Pro Preview (57.3M), below Claude Opus 4.7 (Adaptive Reasoning, Max Effort, 112M), Kimi K2.6 (166M) and DeepSeek V4 Pro (Reasoning, Max Effort, 187M) Key model details: ➤ Context window: 1M tokens (up from 256K on Qwen3.6 Max Preview) ➤ Multimodality: Text input and output only ➤ Pricing: Yet to be announced (Qwen3.6 Max Preview is priced at $1.30/$7.80 per 1M input/output tokens on the @alibaba_cloud first-party API) ➤ Licensing: Proprietary, closed weights

译阿里云发布闭源旗舰模型Qwen3.7 Max，在Artificial Analysis智能指数上获得56.6分，较前代Qwen3.6 Max Preview提升4.8分，与国际前沿模型的差距有所缩小。其进步主要体现在科学推理、智能体及编码能力上。值得注意的是，本次评分提升很大程度上源于模型在“AA-Omniscience”基准上主动选择“不回答”的次数增多，从而将幻觉率从44.2%显著降至22.9%。此外，该模型的上下文窗口已扩大至100万token，但仍延续了Max系列的闭源策略。

查看原推 ↗

Chubby♨️@kimmonismus · 5月21日64

OpenAI is aiming for a release of their upcoming general-purpose LLM. „We have not pushed this model to the limit on open problems. Our focus is to get it out quickly so that everyone can use it for themselves.“ What makes this so impressive is that a general-purpose LLM, not specifically trained for math or this problem, appears to get dramatically better simply by using more test-time compute! OpenAI has a run.

译OpenAI即将推出通用型大语言模型，强调其并非为特定问题或数学领域专门训练。该模型通过增加测试时的计算资源，性能实现显著提升，展现了通用模型在扩展计算时的潜力。官方表示当前重点在于快速发布，供用户自主探索，暂未在开放问题上追求极限优化。这标志着大模型发展的一条新路径。

查看原推 ↗

Google DeepMind@GoogleDeepMind · 5月21日84

Gemini 3.5 Flash has landed.

译Gemini 3.5 Flash 已正式发布。

查看原推 ↗

Google Gemini@GeminiApp · 5月21日74

Gemini 3.5 Flash quickly delivers organized results, no matter how messy the input is. Watch Gemini take chats and texts with clients and turn them into usable documents for your small business.

译Gemini 3.5 Flash能快速提供整理好的结果，无论输入多么混乱。看看Gemini如何将与客户的聊天和文本，转化为您小企业可用的文档。

查看原推 ↗

Rohan Paul@rohanpaul_ai · 5月21日63

Chinese AI lab SenseTime just open-sourced SenseNova U1, a unified multimodal model that can understand, reason, and generate images + text inside 1 model. The interesting part is the architecture: it removes the usual visual encoder and variational auto-encoder setup, then handles image and language inside a shared representation space, instead of being passed between separate modules. That means less handoff between modules, less information loss, and better consistency when creating dense visual content like infographics, guides, posters, comics, and image-text workflows. That’s how the model can generate coherent text and images together in one flow, which is why it is strong for infographics, guides, comics, posters, and step-by-step visual content. For infographic generation specifically, it is also around 2x faster than Qwen-Image-2.0 / Seedream-4.5 while staying in the same rough quality band, based on the client benchmark chart. 1/n

译商汤科技近日开源了SenseNova U1，其核心创新在于架构设计。该模型摒弃了传统的视觉编码器与变分自编码器分离结构，采用单一共享表示空间原生处理图像与文本，极大减少了模块间转换导致的信息损耗。这一设计使模型能够连贯地同时生成图文内容，在信息图、海报、漫画等需要高一致性的密集视觉内容创作上优势显著。性能方面，其信息图生成速度在同等质量下约为Qwen-Image-2.0/Seedream-4.5的两倍。

查看原推 ↗

Artificial Analysis@ArtificialAnlys · 5月21日69

Cohere launches open weights model Command A+ that achieves 37 on the Artificial Analysis Intelligence Index The release of Command A+ places @Cohere in line with Claude 4.5 Haiku on the Intelligence Index, and just above NVIDIA Nemotron 3 Super and Gemini 3.1 Flash-Lite. Key Takeaways: ➤ Command A+ ranks first on AA-Omniscience Non-Hallucination at 86%, ~3 percentage points ahead of the next-best model. Its AA-Omniscience Accuracy is 9%, so the headline AA-Omniscience score lands at -4, demonstrating a similar archetype to Claude 4.5 Haiku, where the model knows its limits ➤ On Cohere’s API, Command A+ (~281 output tokens per second) is faster than several comparable open-weights and small to mid-sized proprietary models (e.g., GPT-5.4 nano, Claude 4.5 Haiku, and Grok 4.3), but still slower than Gemini 3.1 Flash-Lite Preview, which outputs 304 tokens per second ➤ Command A+ trails its peer set on scientific reasoning (HLE ~11%, GPQA Diamond ~76%) and on coding (Terminal-Bench Hard ~25%, SciCode ~38%), consistent with gaps on the hardest science and agentic coding benchmarks ➤ It supports visual reasoning and scores 63% on MMMU-Pro (between Claude 4.5 Haiku at 59% and GPT-5.4 nano (xhigh) at 65%)

译Cohere发布了开源权重模型Command A+，其在AI分析智能指数上的得分与Claude 4.5 Haiku持平。该模型核心优势为极低的幻觉率，在相关榜单上以86%领先，体现出模型“知其不知”的可靠性。在速度方面，其API输出速度超过GPT-5.4 nano等多款模型，但仍略逊于Gemini 3.1 Flash-Lite。模型在科学推理与代码生成等高难度任务上表现稍弱，但具备视觉推理能力，性能位于Claude 4.5 Haiku与GPT-5.4 nano之间。

查看原推 ↗

SenseTime@SenseTime_AI · 5月20日68

Turn your ideas into visuals that spark stories 🧨

译将你的想法转化为激发故事的视觉画面 🧨 [引用 @Adamaestr0_]：大多数AI工具可以写作或生成图像。但这个能同时做这两件事。向你介绍 SenseNova U1。一个能同时思考文本和图像的AI。这改变了一切 🧵

查看原推 ↗

Kling AI@Kling_ai · 5月20日72

http://x.com/i/article/2055141424790970368 # Kling AI Introduces the World’s First Native 4K Video Model On April 23, Kling AI officially launched the world’s first native 4K video generation feature for the Kling 3.0 video model series. Designed for professional-grade content creation, the new 4K feature enables users to generate true 4K videos in a single click — delivering sharper visuals, richer detail, and cinematic image quality while significantly improving production efficiency. Since its launch, Kling 4K has already been adopted across a wide range of creative industries, from Hollywood production teams to independent creators, from animation studios to advertising agencies. Here’s a look at how industry pioneers are using Kling 4K to reshape creative workflows. ## Film & Television As AI-generated video continues to evolve from experimental creation into industrial-scale production, 4K quality has become one of the key requirements for professional film and television workflows. From cinematic detail to character consistency, production teams are increasingly looking for AI tools that can integrate seamlessly into existing pipelines without sacrificing visual fidelity. One of the earliest and most vocal adopters of AI in Hollywood production is Jon Erwin, the creator of House of David and founder & CEO of Innovative Dreams. His team has openly discussed how AI tools have been already being incorporated into large-scale productions, with Kling 4K becoming part of that evolution. > Since House of David season 1, Kling has been an essential part of our workflow. Now it has become the first foundation model that we’ve used that is native 4K. The details are superb and nuanced. It’s beautiful. It’s another leap forward in GAI tools. For production teams like Innovative Dreams, native 4K generation is not simply about resolution, it directly impacts how AI-generated footage can be used alongside traditional cinematic assets. Another studio helping define the next generation of AI-native filmmaking is Wonder Studios, a studio producing music videos alongside Google DeepMind, YouTube and Universal Music Group, original series and commercial work for some of the world's leading brands and artists. Built on a belief that filmmakers should own what they build, Wonder offers creators a genuine stake in their projects, and that same uncompromising approach to craft is why native 4K generation has become essential to the studio's work. > Kling's native 4K is a must-have for any serious creator. The problem with upscalers is that they tend to modify your characters in the process, but because Kling generates at true 4K from the ground up, that issue just doesn't exist. Your characters stay consistent, your quality stays intact, and for anyone working in AI video, it's become an essential part of the workflow. For AI-native productions, consistency is often one of the biggest technical challenges. Traditional upscaling workflows can unintentionally alter facial features, costumes, or visual identity between shots. Native 4K generation helps preserve those details directly from the source, allowing creators to maintain continuity across sequences while reducing additional post-production correction work. ## Animation Production Cao Han, the animation director incorporated Kling AI into multiple stages of production while creating the AIGC feature project Born of the Tide. Cao Han explained that his team tested many different AI models during production. > While some models could accept highly detailed input images, the generated results often degraded into a more generic 3D animation look, with noticeable loss in facial features and fabric textures. In comparison, Kling AI was able to preserve artistic color tones with much higher fidelity while maintaining realistic texture and motion in complex physical effects such as water and fire. At the same time, Born of the Tide is a story heavily focused on ensemble scenes, featuring large-scale sequences such as dragon boat races, ceremonial performances, bombings, and conflicts involving government forces fighting over land. These scenes often require dozens of characters to appear within the same frame. In traditional production workflows, when characters occupy only a small portion of the screen, facial details and visual clarity can easily break down. Kling 4K, however, made these complex large-scale scenes far more viable for stable, production-ready execution. ## Advertising & Commercial Production In the advertising industry, agencies and creative studios are now exploring how native 4K AI workflows can support premium commercial production across beauty, fashion, automotive, and branded storytelling. Wes Walker, founder and managing partner of Obsidian, has worked on high-end commercial campaigns for major global brands, including luxury labels such as Longchamp. In discussing the studio’s adoption of Kling 4K, Walker highlighted how production-grade image quality is becoming essential for AI-generated assets to coexist with traditional cinematography. > At Obsidian, we work to a premium standard, whether the work is live action, hybrid, or fully synthetic. Our productions in luxury, beauty, automotive, product, and narrative world-building demand high resolution, high fidelity imagery that can stand seamlessly beside live action. That’s why 4K matters. It marks a real step toward AI becoming fully production-ready. We’ve been consistently impressed by Kling’s performance, the quality of its team, and its ability to render imagery that feels cinematic, grounded, and emotionally alive. When paired with our own pipeline and tools like EchoChrome, which upscales bit depth, these assets are now holding up on major commercials and high-end productions in ways that were not possible even a short time ago. For studios operating at the premium end of commercial production, the shift toward native 4K is not only about sharper imagery, but also about enabling AI assets to withstand professional color grading, compositing, and large-format delivery requirements. This transition is also being recognized by long-established production companies like Tool, an award-winning cross-media creative production company with decades of experience in live-action filmmaking and creative technology. The company has received major industry recognition including Emmy Awards and Cannes Palme d’Or honors. > We’ve been testing Kling’s 4K output across a range of projects and the results speak for themselves - exceptional image fidelity, sharp textures, no degradation in logos or fine detail — and the stability enables precise creative control that supports a truly intentional cinematic experience rather than purely generative output. This changes what’s possible for production-ready AI work. Dustin Callif, President of Tool, described Kling 4K as a meaningful advancement toward production-ready AI filmmaking and advertising workflows. ## AI Productivity Tools For creative platforms and production tools, the value of Kling 4K is even more direct: it significantly reduces intermediate production steps while enabling large-scale, high-quality content creation. Launched by Wondershare, ReelMate is a one-stop AI premium drama production platform that supports a fully integrated workflow spanning scriptwriting, asset generation, storyboard creation, video generation, and post-production editing. Designed for premium AI live-action productions as well as 2D and 3D animated content creation, ReelMate deeply integrates leading AI video models including Kling, while leveraging director-level AI agent capabilities to ensure character, scene, and cinematic consistency across multi-shot productions. Wondershare ReelMate has already achieved a 10× increase in storyboard creation efficiency, while overall AI-driven production efficiency has improved by more than 5× compared to traditional workflows. By deeply integrating Kling’s native 4K capabilities into its AI premium drama and film production pipeline, ReelMate is further opening up a new pathway toward industrial-grade AI film and television production, enabling both production efficiency and visual quality to reach professional studio standards. According to evaluations conducted by Wondershare, Kling AI is capable of directly generating native 3840×2160 resolution content during the generation stage itself. Even under complex lighting conditions, character skin textures remain delicate and natural, achieving cinema-level visual quality. In character rendering, facial textures, eye details, and subtle micro-expressions are reproduced with remarkable precision, providing stronger visual support for high-end AI live-action and premium drama production. Tech companies like Dashverse have begun building full-stack narrative platforms centered on AI. Its product Frameo, for instance, is designed for the large-scale production of next-generation AI movies, TV dramas and micro-dramas. Through its partnership with Kling AI, Dashverse is integrating more advanced AI production workflows into Frameo, enabling seamless synergy across audio production, character animation, multilingual storytelling and cross-genre style adaptation. > Combining Kling's continuously evolving video generation capabilities with Dashverse's production infrastructure, the overall production cycle has been cut by more than 50%, while retaining studio-level content quality and supporting global distribution. Powered by Frameo and Kling, Dashverse is building global infrastructure for AI-native storytelling, helping bridge the "imagination gap". It empowers creators, filmmakers, VFX artists and animation studios to operate as efficiently as full-fledged production companies. More than a feature upgrade, Kling 4K signals a broader shift in the industry: AI-generated content is moving from experimental workflows into a scalable tool for professional production. As 4K adoption grows across studios, filmmakers, and commercial teams, production-ready quality is becoming the new standard.

译4月23日，Kling AI正式推出全球首个原生4K视频生成模型，专为专业内容创作设计。该功能支持一键生成真4K画质视频，显著提升画面细节与制作效率。模型已获得好莱坞团队、动画工作室等多方采用。好莱坞制片人指出，这是其工作流中首个使用的原生4K基础模型；Wonder Studios强调，原生4K从底层生成避免了传统放大技术的角色变形问题，保持了画面一致性；动画导演则认为，该模型在保留艺术色调与复杂特效纹理方面优于同类产品。

查看原推 ↗

Rohan Paul@rohanpaul_ai · 5月20日73

Chinese AI labs are increasingly releasing very serious open source work. SenseNova U1 just dropped on HuggingFace: native multimodal modeling, MoT architecture (38B-Active 3B MoE) It attacks the hardest part of image generation: readable, structured, consistent image-text output. The most interesting part of SenseNova U1 is it treats multimodal generation as one native modeling problem, not a chain of separate vision, language, and image modules. That means less handoff between modules, less information loss, and better consistency when creating dense visual content like infographics, guides, posters, comics, and image-text workflows. ComfyUI support, fast A3B inference, and absolutely brilliant for dense visuals like infographics, posters, comics, and guides.

译商汤科技SenseNova U1已开源发布。其核心创新在于原生多模态统一建模，将视觉、语言与图像生成视为一个统一问题，而非分立模块的链式处理，从而减少了信息损失。该模型采用MoT架构（38B-Active 3B MoE），在生成信息图、海报、漫画等结构复杂的密集图文内容时能保持高度一致性。详细的技术报告披露了其包括近无损视觉接口、联合训练策略在内的完整构建方案，为行业提供了前沿参考。

查看原推 ↗

Berryxia.AI@berryxia · 5月20日73

兄弟们，Google DeepMind刚放出的Gemini 3.5 Flash，直接把Intelligence vs Speed的Pareto前沿拉新高度了。 Artificial Analysis拿到预发布权限，测完后结论很明确：它在Intelligence Index拿到55分，比Gemini 3 Flash高9分，直接超过Grok 4.3和Claude Sonnet 4.6。 Agentic任务（GDPval-AA）Elo评分飙到1656，远超前代。幻觉率从92%暴降到61%。多模态理解也继续领跑，MMMU-Pro 84%。输出速度超280 tokens/s，比上一代快70%。看起来几乎完美。但代价是：跑一次Artificial Analysis Intelligence Index的成本是Gemini 3 Flash的5.5倍，比Gemini 3.1 Pro贵75%。定价直接3倍（$1.5/$9 per 1M input/output），加上agentic任务里token用量显著增加。速度和智能终于兼得，但价格直接把“Flash”这个词的便宜属性干掉了。完整基准在这里：https://artificialanalysis.ai/models/gemini-3-5-flash

译Google DeepMind 最新发布的 Gemini 3.5 Flash 模型在性能与速度的平衡上取得突破。其智能指数得分为 55，较上一代大幅提升，超越了 Grok 4.3 和 Claude Sonnet 4.6。模型在智能体任务和降低幻觉率方面进步显著，输出速度超过 280 tokens/s。然而，其 API 定价相比前代模型上涨约 3 倍，运行基准测试的成本更是达到 5.5 倍。这意味着 Gemini 3.5 Flash 在实现“更快更智能”的同时，也显著改变了 Flash 系列以往低成本的市场定位。

查看原推 ↗

Rohan Paul@rohanpaul_ai · 5月20日74

Gemini 3.5 Flash now outruns Gemini 3.1 Pro on several real-work automation tests. - With 4x faster output tokens per second - A really powerful agent model fast enough and cheap enough for everyday work - Flash beats Gemini 3.1 Pro on several hard agent and coding benchmarks, including 76.2% Terminal-Bench 2.1, 83.6% MCP Atlas, and 1,656 Elo GDPval-AA. - Available in the Gemini app, AI Mode in Search, Gemini API, Antigravity, Android Studio, and Google’s enterprise agent products. - When coupled with the updated Antigravity harness, 3.5 Flash becomes a powerful engine for deploying collaborative subagents to tackle problems at scale. so one subagent might inspect a folder, another might rewrite code, another might test the result, and another might summarize what changed.

译谷歌推出Gemini 3.5 Flash模型，其输出速度提升至四倍，在终端基准测试等多项高难度任务中超越了Gemini 3.1 Pro。该模型以高速度与低成本，成为适合日常工作的强大代理工具。它已登陆Gemini应用、搜索AI模式及企业级产品等平台。配合升级的Antigravity工具，Gemini 3.5 Flash可驱动协作子代理，大规模并行处理代码审查、重写与测试等复杂任务，实现高效自动化工作流。

查看原推 ↗

meng shao@shao__meng · 5月20日64

Gemini Omni 来了！Google 的优势，果然还是在多模态模型吧？！ Gemini 3.0 发布时，最惊艳的就是之前 Claude 和 GPT 都没有的多模态理解能力；Nano Banana 和 Veo 在多模态生成方面也是断档的强（发布时，后来被超越了）现在 Google I/O 发布的 Gemini Omni，又是一个原生多模态的「理解 + 生成」模型，当前主攻视频，可用任意组合输入（图、文、视频、音频）产出或编辑视频。来看看官方对 Omni 和 Veo 的对比： 1. 工作方式 Veo：多模态常被压成文本再生成 Omni：从底层原生多模态设计 2. 提示词 Veo：需非常具体、逐帧描述 Omni：可只给意图，由推理补细节 3. 编辑 Veo：多为单次生成 Omni：多轮对话式编辑，每步叠加上一步 4. 知识 Veo：偏视觉模式匹配 Omni：结合 Gemini 的世界知识、物理直觉注意：这里的 Veo 代表了 Veo、Sora、Seedance 等几乎全部之前的视频生成模型，这个对比感觉几乎是吊打了。 Omni 三大能力 1. 对话式视频编辑（核心差异化） · 用自然语言改已有视频，每轮指令建立在上一轮结果上。 · 强调一致性：角色、物理、场景记忆在多轮修改后仍连贯。 · 典型操作：换背景、改机位、换物体/角色、改动作、加特效，无需每次重述整段 prompt。 2. 世界知识 + 物理直觉 · 物理：重力、动能、流体等，用于更可信的运动（如弹珠连锁轨道）。 · 知识：历史、科学、文化语境，用于科普/叙事类内容（如粘土定格「蛋白质折叠」）。 · 文字：不只「能写字」，而是文字与画面动作、节奏同步（如字母表 26 项 + 对应 lower third）。 3. 任意参考物组合（Reference anything） · 图、文、视频、音频可混用为「配料」，合成一条叙事。 · 能力包括：动作/风格迁移、参考图换角色（保留动作与口型）、草图仅作运动引导转实拍、分镜图按节拍生成等。 · 音频：首发主要支持人声参考；其他音频输入类型将陆续开放。

译Google发布了原生多模态模型Gemini Omni。与传统模型需逐帧描述不同，它采用底层原生设计，支持以意图驱动生成视频，并能通过多轮对话进行编辑，每一步都基于上一结果，确保一致性。该模型融合了Gemini的世界知识与物理直觉，并能将图、文、音视频等任意参考物组合，实现跨模态叙事生成。其目标是“从任何东西创造任何东西”，并从视频生成起步。

查看原推 ↗

Demis Hassabis@demishassabis · 5月20日81

Gemini 3.5 Flash is amazing! - Performs better than 3.1 Pro on coding & agentic tasks - 4x faster than other frontier models - 12x faster in @antigravity - 800 tokens/sec! - Often at less than half the cost And Pro to come… Try it in @antigravity, @GeminiApp & more - enjoy!

译Gemini 3.5 Flash 太棒了！ - 在编码和智能体任务上表现优于 3.1 Pro - 速度比其他前沿模型快 4 倍 - 在 @antigravity 中快 12 倍 - 达到 800 tokens/sec！ - 成本通常不到一半 Pro 版本即将到来… 在 @antigravity、@GeminiApp 等平台体验吧 - 享受吧！

查看原推 ↗

Rohan Paul@rohanpaul_ai · 5月20日69

Google Gemini 3.5 Flash is super strong model for its class. Beats Gemini 3.1 Pro on so many benchmarks. An agent model with 4x faster tokens per second. And @aimlapi just added gemini 3.5 Flash to their API and keeping it FREE for 24hrs. Setup instructions in comment.

译Google Gemini 3.5 Flash 是其类别中非常强大的模型。在众多基准测试中超越了 Gemini 3.1 Pro。这是一个代理模型，每秒处理的令牌速度快4倍。 @aimlapi 刚刚将 Gemini 3.5 Flash 添加到其 API 中，并且在24小时内免费提供。设置说明见评论。

查看原推 ↗

Berryxia.AI@berryxia · 5月20日71

Google I/O 大会发布会重要的一个发布就是它！ Google DeepMind今天直接把“从任何东西生成任何东西”这件事，迈出了第一步。他们发布了Gemini Omni。不仅仅是又一个视频生成工具，而是想把Gemini和生成媒体系统彻底融合。它真正懂物理、懂历史、懂文化、懂故事逻辑。你能定义一个角色，然后随便扔进任何场景，它都能保持一致的外貌、动作和光影。你能用自然语言改风格、加效果，或者直接把你自己拍的视频重新想象——改环境、加物体、换动作，全程对话式操作。以前生成视频是“拍完一段就结束”，现在它是活的、可编辑的、能持续演进的世界。视频终于不再是死的内容，是可以被实时重写的“世界素材”。 Gemini Omni Flash已经在Gemini App、Flow by Google和YouTube Shorts上线，几周后也会开放API。 PS：有人说效果不如SD2，尤其中文就更不用说了。但是支持片段编辑的效果还不错。

译Google DeepMind在I/O大会上发布了Gemini Omni模型，旨在迈出“从任何东西生成任何东西”的第一步。该模型将Gemini的智能与生成媒体系统深度融合，在世界理解、多模态和编辑能力上实现飞跃。其核心特点在于生成的视频能保持角色、光影等逻辑一致性，并支持通过自然语言进行实时编辑和风格调整，将视频转变为可动态演进的“世界素材”。该模型目前已在部分应用上线并即将开放API，不过其实际效果，尤其是在中文生成方面，仍存在一些讨论。

查看原推 ↗

karminski-牙医@karminski3 · 5月20日61

Gemini-3.5-flash发布! 价格直接翻三倍? Google I/O 开始啦! 今天上来就是模型发布, Gemini-3.5-flash 直接全量上线了. 价格直接从 gemini-3-flash 的每百万 Token 输入/输出 $0.50 / $3 涨到了 $1.5 / $9, 那么性能有提示吗? 我简单测了几个例子, 目前来看体感处于 Genini-3.0-pro 和 Gemini-3.1-Pro 之间 (测试用的是 Thinking level: high) 不过稳定性就差很多了, 3D 渲染的 shader 它怎么都写不对, 我只能手动修了一下才能运行视频中这个火山喷发演示. 从目前 Gemini 系列模型迭代来看, Google 可能更想学 Anthropic, 搞三个档次. flash-lite 会取代之前 flash 的位置. 而 flash 更可能是主打一百万上下文内不设置阶梯定价, 承接 pro 这部分溢出的用户. 而本身编程性能上距离 pro 还有点差距. Pro 自然就是旗舰级别模型了. 不过现在这个定价来看, 可能这次 flash 更多是为了跟这次一起发布的 Antigravity CLI 一起搭配用的. 做 claude code 中 sonnet 模型的生态位置. #geminiflash35 #geminiflash #googleio

译在Google I/O大会上，Gemini-3.5-flash模型正式发布，其定价从上一代的$0.5/$3大幅上涨至$1.5/$9。实测显示，其性能介于Gemini-3.0-Pro与Gemini-3.1-Pro之间，但稳定性有所不足。此举被视为Google借鉴Anthropic的产品分层策略，计划用flash-lite、新flash和Pro形成梯队。其中新flash主打在百万级上下文内不设阶梯定价，以承接Pro模型溢出的用户。此次调价也可能旨在配合新发布的Antigravity CLI工具，定位类似Claude Code中的Sonnet模型，从而构建其开发生态。

查看原推 ↗

Orange AI@oran_ge · 5月20日77

Gemini flash 3.5 昨晚发布，现已可用。 - 模型效果大幅超越 3.1 Pro，指标和 gpt 5.5 接近，比 gpt5.5 好的是 Agentic 和多模态。 - 价格只要 gpt5.5 的三分之一，缓存价格只要六分之一。 - API 定价 $1.50 / $9.00 per 1M token(输入/输出)，缓存输入 $0.15。上下文窗口 1M token。 - 速度极快，是其他旗舰模型的4倍，非常适合 Agent 使用。官方介绍地址： https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-5/

译Gemini flash 3.5 昨晚发布，现已可用。 - 模型效果大幅超越 3.1 Pro，指标和 gpt 5.5 接近，比 gpt5.5 好的是 Agentic 和多模态。 - 价格只要 gpt5.5 的三分之一，缓存价格只要六分之一。 - API 定价 $1.50 / $9.00 per 1M token(输入/输出)，缓存输入 $0.15。上下文窗口 1M token。 - 速度极快，是其他旗舰模型的4倍，非常适合 Agent 使用。官方介绍地址： https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-5/

查看原推 ↗

Rohan Paul@rohanpaul_ai · 5月20日67

Google's new Gemini Omni, can generate "anything from any input" A video AI model that can create and edit clips from video, images, audio, text, and sketches. A user can record a normal video, then ask Omni to add a character, replace an object, change the action, alter the style, sync sound, or move the camera through plain language. Keeps the same scene stable after each edit. Video models often fail when they must preserve identity, motion, lighting, object position, and cause-and-effect across multiple changes. Gemini Omni Flash is meant to handle those edits inside the Gemini app, Google Flow, and YouTube Shorts. Omni has stronger world understanding, meaning it tries to model gravity, fluid motion, kinetic energy, and physical interaction more realistically. Ovearall, Omni makes AI video feel less like prompt-based generation and more like directing a scene through repeated instructions. Google is also attaching SynthID watermarking and C2PA Content Credentials to Omni outputs, so edited or generated media can be identified as AI-made.

译谷歌近日推出Gemini Omni，这是一款能处理视频、图像、音频、文本及草图等多种输入的全能型视频AI模型。用户可通过自然语言指令对已有视频进行添加角色、替换物体、调整动作、改变风格、同步音效及移动镜头等操作，且多次编辑后仍能保持场景一致性。该模型具备更强的世界理解能力，能更真实地模拟重力、流体等物理交互，使视频编辑更接近导演创作。输出内容将附带SynthID水印与C2PA内容凭证，以明确标识其AI生成属性。

查看原推 ↗

Jeff Dean@JeffDean · 5月20日81

Highly capable models that are fast are super important. Our new Gemini 3.5 Flash model is a great mix of fast and capable.

译谷歌发布了新模型Gemini 3.5 Flash，该模型强调速度与性能的出色结合。与Gemini 3.1 Pro相比，3.5 Flash在几乎所有基准测试上表现更优，尤其在编程能力方面有巨大进步。其核心优势在于极快的推理速度，比其他前沿模型快4倍。在智能水平与输出速度的坐标图中，该模型凭借其卓越表现独占右上角优势区域，代表了速度与能力的新标杆。

查看原推 ↗

Demis Hassabis@demishassabis · 5月20日79

Gemini Omni is a major leap in world understanding & multimodal editing! It can take photos, video & audio and build entirely new scenes. Over time it’ll be able to handle any input & any output - starting w/ video You can even give it your own videos & iterate on your ideas:

译Gemini Omni在世界理解与多模态编辑方面实现了重大飞跃！它能处理照片、视频和音频，并构建全新的场景。随着时间的推移，它将能够处理任何输入和任何输出——从视频开始。你甚至可以提供自己的视频，并在此基础上迭代你的想法：

查看原推 ↗

Google AI@GoogleAI · 5月20日74

By now, you've probably heard about Gemini Omni, our new model designed to create anything from any input, starting with video. But... what's the big deal? Let’s break it down 🧵👇

译到现在，你可能已经听说了 Gemini Omni，这是我们新推出的模型，旨在从任意输入（从视频开始）创造任何内容。但……这有什么大不了的？让我们来分解一下 🧵👇

查看原推 ↗

Sundar Pichai@sundarpichai · 5月20日79

Gemini Omni doesn't just build scenes that look real, it reasons about what should happen next. It combines an intuitive understanding of physics with Gemini's knowledge of history, science, and cultural context. Rolling out today starting with video outputs to Google AI Plus, Pro and Ultra subscribers globally through the @Geminiapp + Google Flow, and @YouTube Shorts this week.

译Gemini Omni不仅能构建逼真的场景，还能推理接下来应该发生什么。它结合了对物理的直观理解与Gemini在历史、科学和文化背景方面的知识。今日起，通过@Geminiapp + Google Flow和@YouTube Shorts，向全球Google AI Plus、Pro和Ultra订阅用户推出视频生成功能。

查看原推 ↗

Google Gemini@GeminiApp · 5月20日81

Meet Gemini Omni, our new model that can create anything from any input, starting with video. With Gemini Omni, you can combine images, videos and text as inputs and generate high-quality videos grounded in Gemini's real-world knowledge. #GoogleIO

译介绍Gemini Omni，我们的新模型，可以从任何输入创建任何内容，首先从视频开始。借助Gemini Omni，您可以将图像、视频和文本作为输入组合，并生成基于Gemini现实世界知识的高质量视频。#GoogleIO

查看原推 ↗

OpenRouter@OpenRouter · 5月20日82

Gemini 3.5 Flash from @GoogleDeepMind is live on OpenRouter! Beats Gemini 3.1 Pro on coding, agentic work, and tool use at Flash-tier price and speed. 1M context, 65K max output, multimodal. $1.50/M input, $9/M output.

译来自@GoogleDeepMind的Gemini 3.5 Flash现已登陆OpenRouter！在编码、智能体任务和工具使用方面超越Gemini 3.1 Pro，同时保持Flash级别的价格和速度。支持100万上下文、6.5万最大输出、多模态。输入$1.50/百万token，输出$9/百万token。

查看原推 ↗

AYi@AYi_AInotes · 5月20日80

Damn! Google has really gone absolutely wild this time. Gemini Omni is about to blow the roof off the ceiling of video generation 🤯 Making videos used to be like building with Lego blocks, piece by piece, slowly. Now it’s giving you a magic Lego factory that can actually think. You chat in natural language, and it understands real-world physics, history, biology, culture—then directly generates or edits any video. Five most mind-blowing abilities that you can use right now: 1Understands real physics—glass marbles colliding, turning, and bouncing in ways that match reality. 2Faces never get distorted—define a character once, put them in any scene, any action. 3Edit videos like you edit ChatGPT text—change backgrounds, swap people, add effects with a single sentence. 4Upload an image and apply any style—make claymation, visualize protein folding, whatever you imagine. 5Video isn’t a dead file anymore—change angles, lighting, objects, even storylines just by chatting. This isn’t a competitor to Sora. This is the first time a world model has truly entered a consumer-facing product. It’s not just generating pixels—it’s simulating a coherent physical and semantic world. Open the Gemini app right now and try Omni Flash. Go try it. You’ll thank me later.

译Google推出Gemini Omni，首个面向消费者的世界模型。它通过自然语言交互，将Gemini的智能与生成媒体系统结合，实现了对物理规律、历史、生物等世界的深刻理解。用户可以像编辑ChatGPT文本一样用单句指令编辑视频，实现人物一致性、风格迁移、角度调整等功能。它不是单纯生成像素，而是模拟连贯的物理与语义世界，标志着AI视频生成从拼接工具向智能创作系统的飞跃。

查看原推 ↗

Chubby♨️@kimmonismus · 5月20日81

The real „wow“ moment is Gemini Omni. A world model towards AGI. It can create anything from any input. This is insane.

译真正的“哇”时刻是 Gemini Omni。一个迈向 AGI 的世界模型。它可以从任何输入创建任何内容。这太疯狂了。

查看原推 ↗

Google AI Developers@googleaidevs · 5月20日84

✨ Introducing Gemini 3.5, our latest family of models combining frontier intelligence with action. The series sets a new standard for agentic models that don't just reason, they execute.

译✨ 推出 Gemini 3.5，这是我们最新的模型家族，将前沿智能与行动能力相结合。该系列为智能体模型树立了新标准，它们不仅能推理，更能执行。

查看原推 ↗

5月22日

09:39

karminski-牙医@karminski3

66

腾讯发布翻译专用大模型，性能超越DeepSeek-V4-Pro

腾讯推出了三款翻译专用大模型：Hy-MT2-1.8B、Hy-MT2-7B与Hy-MT2-30B-A3B。其中，30B-A3B版本在专攻金融、法律、医疗、技术等特定领域的 DomainMTBench 翻译能力测试中，其表现全面超越了通用大模型 DeepSeek-V4-Pro。这显示了专用模型在垂直领域的显著性能优势。

模型发布

08:13

Berryxia.AI@berryxia

60

Stable Audio 3登陆Mac本地运行，音乐生成效率惊人

Stable Audio 3官方正式发布，并提供了针对苹果MLX框架的优化版本，使得强大的音乐生成模型能够直接在本地Mac电脑上高效运行。其核心亮点在于在M5 Pro芯片上可实现59倍实时生成速度，性能表现突出。此外，该工具支持在不到1小时内完成LoRA微调，并提供快速（Sm）与高质量（Medium）两种生成模式。开发者鼓励社区积极探索其潜力，标志着本地化音乐创作工具达到了新高度。

dadabots: 🥳 Announcing Stable Audio 3 🍕 🏆 fastest music models ever 💻 runs on MacBookPro M-series 🧪 break it plz 🧠 LoRA fine...

开源/仓库模型发布端侧语音

02:38

Runway@runwayml

84

Aleph 2.0来了。现在你可以编辑视频中的单个帧，预览更改，然后Aleph 2.0会将该编辑应用到整个视频。立即在下方链接的网页版新编辑工作室中尝试。

图像生成模型发布视频

关联讨论 3 条

02:13

Alibaba Cloud@alibaba_cloud

75

Qwen3.7-Max已在@OpenRouter上线 https：//x.com/OpenRouter/status/2057500097206976983？s=20

OpenRouter: The new Qwen3.7-Max from @Alibaba_Qwen is live on OpenRouter. The flagship of the Qwen3.7 series, built for agent-centri...

智能体模型发布编码

关联讨论 8 条

01:56

Rohan Paul@rohanpaul_ai

84

阿里巴巴发布旗舰模型Qwen3.7-Max，专为Agent时代打造

阿里巴巴正式推出最新旗舰模型Qwen3.7-Max，定位为Agent时代的生产级基础模型。该模型在权威评测中得分56.6，较前代显著提升，性能与GPT-5.4相当。其核心优势在于卓越的Agent可靠性，能够在复杂任务中自主规划、调用工具、纠错并持续执行。通过底层深度优化，模型实现了10倍推理加速，并支持长达数小时的自主运行与多工具协作。该模型现已上线阿里云模型工作室，并兼容Claude Code、OpenClaw等主流开发框架，助力开发者构建实际应用。

Qwen: 📣Meet Qwen3.7-Max - our latest flagship, made for the Agent Era. A versatile foundation for agents that actually get th...

智能体MCP/工具推理模型发布

关联讨论 8 条

00:36

OpenRouter@OpenRouter

精选78

阿里巴巴通义千问团队的全新Qwen3.7-Max现已登陆OpenRouter。作为Qwen3.7系列的旗舰模型，专为以智能体为核心的工作场景打造：编程、办公与生产力任务，以及长周期自主执行。在编程和智能体基准测试中较Qwen3.6有显著提升，并支持显式提示缓存以处理重复上下文。

智能体模型发布编码

关联讨论 8 条

推荐理由：阿里旗舰迭代，重点转向 agent 和长程任务，这次 benchmark 跳跃不是挤牙膏，做 coding agent 的可以认真试试。

5月21日

22:12

Alibaba Cloud@alibaba_cloud

76

阿里巴巴推出其最新闭源旗舰大模型Qwen3.7 Max，在Artificial Analysis智能指数上获得56.6分，较上代预览版提升4.8分，是其迄今最接近国际顶尖水平的模型。此次分数提升主要得益于科学推理、代理和编码能力的增强，其中模型的幻觉率大幅降低（从44.2%降至22.9%）是主要贡献因素。模型的上下文窗口已扩展至100万tokens，仍仅支持文本输入输出，具体定价尚未公布。

Artificial Analysis: Alibaba's new Qwen3.7 Max model scores 56.6 on the Artificial Analysis Intelligence Index, 4.8 points higher than Qwen3....

智能体推理模型发布编码

关联讨论 8 条

21:56

Chubby♨️@kimmonismus

66

阿里云发布Qwen3.7 Max：代理能力泛化或比性能突破更重要

阿里云发布新旗舰模型Qwen3.7 Max，定位为“代理时代”的基础模型，强调其在端到端编码、办公自动化等实际任务中的执行能力。模型在一个内核优化任务中展示了35小时无人干预的自主运行能力，完成了超过1000次工具调用。但这并非模型的全面自我进化，而是针对特定优化目标的迭代改进。更值得关注的是，Qwen声称其代理能力能从多样化的训练环境中泛化，如同语言能力从文本中泛化。这一观点若成立，其意义将远超任何基准测试成绩。

Qwen: 📣Meet Qwen3.7-Max - our latest flagship, made for the Agent Era. A versatile foundation for agents that actually get th...

智能体模型发布编码

21:42

Alibaba Cloud@alibaba_cloud

85

通义千问旗舰模型Qwen3.7-Max发布

阿里云发布了通义千问系列的新旗舰模型Qwen3.7-Max，定位为面向智能体时代的通用基础模型。该模型旨在为“能真正完成任务”的智能体提供强大支撑，其核心能力包括：支持端到端的复杂编码任务，可作为集成多智能体协作的办公助手，并能执行超过35小时的长期自主任务。该模型具有框架无关的兼容性，可适配Claude Code、OpenClaw等多种工具链。目前，用户已可通过Model Studio平台调用其API。

智能体MCP/工具模型发布编码

关联讨论 8 条

21:40

Qwen@Alibaba_Qwen

精选82

Qwen3.7-Max：面向Agent时代的旗舰模型

Qwen3.7-Max是Qwen系列面向Agent时代推出的最新旗舰模型，旨在为能完成实际任务的智能体提供强大基础。其核心能力包括：可作为端到端编码智能体，处理前端原型与多文件重构；作为可靠的办公助手，通过MCP集成与多智能体编排协同工作；并支持超长时间（超过35小时）的自主运行，执行复杂任务链。该模型兼容Claude Code、OpenClaw等主流开发框架，现已上线阿里云模型工作室与Qwen Studio提供服务。

智能体MCP/工具模型发布

关联讨论 8 条

推荐理由：Qwen 3.7-Max 的亮点不在榜上分数，而是它瞄准 Agent 场景的连贯执行能力，35 小时不间断跑 kernel 优化，对需要长线任务的开发者是直接可用的探索方向。

21:40

Qwen@Alibaba_Qwen

76

阿里巴巴近期推出了新一代闭源旗舰模型Qwen3.7 Max。该模型在Artificial Analysis智能指数上获得56.6分，较前代Qwen3.6 Max Preview提升了4.8分，创下阿里系模型最接近全球前沿水平的记录。此次升级主要体现在科学推理、智能体能力和代码生成方面，同时显著降低了模型幻觉率。值得注意的是，其分数提升部分源于模型更倾向于拒绝回答，而非完全依靠事实准确率的提高。技术上，其上下文窗口已扩大至100万tokens，仍保持闭源权重。尽管如此，该模型在整体能力上仍落后于OpenAI、Anthropic和Google的同类产品。

Artificial Analysis: Alibaba's new Qwen3.7 Max model scores 56.6 on the Artificial Analysis Intelligence Index, 4.8 points higher than Qwen3....

智能体推理模型发布编码

关联讨论 8 条

19:29

🚨 AI News | TestingCatalog@testingcatalog

72

阿里巴巴发布了其最新的专有模型 Qwen 3.7 Max，专为智能体编码设计。 Qwen 3.7 Max 在人工智能分析智能指数上获得 56.6 分，超越了近期发布的 Gemini 3.5 Flash 和 Kimi K2.6。

Alibaba Group: Qwen3.7-Max is live! 🚀 Introducing the latest proprietary model, built for advanced agentic coding, complex reasoning, ...

智能体推理模型发布编码

16:56

Tencent Hy@TencentHunyuan

精选74

腾讯开源Hy-MT2多语言翻译模型

腾讯正式开源Hy-MT2多语言翻译模型，支持33种语言间的无缝互译。其7B与30B-A3B版本在开源模型中达到最先进的翻译性能，超越了许多参数规模大数十倍的模型。更具突破性的是，1.8B轻量级版本性能超越微软等主流商业API，并凭借腾讯AngelSlim 1.25-bit极量化技术，仅需440MB存储空间，即可在主流手机芯片上本地运行，推理速度较前代提升1.5倍，显著降低了高质量AI翻译的部署门槛。

开源生态模型发布端侧

关联讨论 2 条

推荐理由：虽然翻译领域不算最热，腾讯这个1.8B开源模型用1.25位量化直接跑在手机上，效果还超微软商业API，做本地化翻译工具的人值得关注。

16:28

Artificial Analysis@ArtificialAnlys

70

阿里云发布Qwen3.7 Max模型，评测得分56.6分

阿里云发布闭源旗舰模型Qwen3.7 Max，在Artificial Analysis智能指数上获得56.6分，较前代Qwen3.6 Max Preview提升4.8分，与国际前沿模型的差距有所缩小。其进步主要体现在科学推理、智能体及编码能力上。值得注意的是，本次评分提升很大程度上源于模型在“AA-Omniscience”基准上主动选择“不回答”的次数增多，从而将幻觉率从44.2%显著降至22.9%。此外，该模型的上下文窗口已扩大至100万token，但仍延续了Max系列的闭源策略。

推理模型发布编码

05:35

Chubby♨️@kimmonismus

64

OpenAI即将推出通用型大语言模型，强调其并非为特定问题或数学领域专门训练。该模型通过增加测试时的计算资源，性能实现显著提升，展现了通用模型在扩展计算时的潜力。官方表示当前重点在于快速发布，供用户自主探索，暂未在开放问题上追求极限优化。这标志着大模型发展的一条新路径。

Noam Brown: This is a general-purpose LLM. It wasn't targeted at this problem or even at mathematics. Also, it's not a scaffold. We ...

OpenAI推理模型发布

04:17

Google DeepMind@GoogleDeepMind

精选84

Gemini 3.5 Flash 已正式发布。

Google多模态模型发布

关联讨论 2 条

推荐理由：Google 在 Gemini 3.5 上继续扩展 Flash 线，这种轻量模型对成本和延迟敏感场景很关键，如果你在等一个便宜的 Gemini API，该看了。

00:44

Google Gemini@GeminiApp

精选74

Gemini 3.5 Flash能快速提供整理好的结果，无论输入多么混乱。看看Gemini如何将与客户的聊天和文本，转化为您小企业可用的文档。

Google推理模型发布

关联讨论 12 条

推荐理由：Gemini 3.5 Flash 的核心不是刷榜，而是解决现实中‘信息像一堆垃圾’的问题，这种从杂乱输入直接生成文档的能力，对小企业和自由职业者比 SOTA 更有用。

00:36

Rohan Paul@rohanpaul_ai

63

商汤开源统一多模态模型SenseNova U1

商汤科技近日开源了SenseNova U1，其核心创新在于架构设计。该模型摒弃了传统的视觉编码器与变分自编码器分离结构，采用单一共享表示空间原生处理图像与文本，极大减少了模块间转换导致的信息损耗。这一设计使模型能够连贯地同时生成图文内容，在信息图、海报、漫画等需要高一致性的密集视觉内容创作上优势显著。性能方面，其信息图生成速度在同等质量下约为Qwen-Image-2.0/Seedream-4.5的两倍。

图像生成多模态开源/仓库模型发布

00:06

Artificial Analysis@ArtificialAnlys

69

Cohere推出开源权重模型Command A+，智能指数得分达37

Cohere发布了开源权重模型Command A+，其在AI分析智能指数上的得分与Claude 4.5 Haiku持平。该模型核心优势为极低的幻觉率，在相关榜单上以86%领先，体现出模型“知其不知”的可靠性。在速度方面，其API输出速度超过GPT-5.4 nano等多款模型，但仍略逊于Gemini 3.1 Flash-Lite。模型在科学推理与代码生成等高难度任务上表现稍弱，但具备视觉推理能力，性能位于Claude 4.5 Haiku与GPT-5.4 nano之间。

开源/仓库模型发布评测/基准

5月20日

18:46

SenseTime@SenseTime_AI

精选68

将你的想法转化为激发故事的视觉画面 🧨 【引用 @Adamaestr0_】：大多数AI工具可以写作或生成图像。但这个能同时做这两件事。向你介绍 SenseNova U1。一个能同时思考文本和图像的AI。这改变了一切 🧵

Adam: La mayoría de las herramientas de IA pueden escribir o generar imágenes. Pero esta hace AMBAS cosas a la vez. Te present...

多模态模型发布

推荐理由：商汤发了 SenseNova U1，主打文本和图像同时生成的「全模态」，但宣传语太简略，没给任何性能数据或技术细节，暂时看不出是真突破还是常规迭代，先标记关注。

12:36

Kling AI@Kling_ai

72

Kling AI推出全球首个原生4K视频生成模型

4月23日，Kling AI正式推出全球首个原生4K视频生成模型，专为专业内容创作设计。该功能支持一键生成真4K画质视频，显著提升画面细节与制作效率。模型已获得好莱坞团队、动画工作室等多方采用。好莱坞制片人指出，这是其工作流中首个使用的原生4K基础模型；Wonder Studios强调，原生4K从底层生成避免了传统放大技术的角色变形问题，保持了画面一致性；动画导演则认为，该模型在保留艺术色调与复杂特效纹理方面优于同类产品。

多模态模型发布视频

11:34

Rohan Paul@rohanpaul_ai

73

SenseNova U1开源发布，革新原生多模态生成

商汤科技SenseNova U1已开源发布。其核心创新在于原生多模态统一建模，将视觉、语言与图像生成视为一个统一问题，而非分立模块的链式处理，从而减少了信息损失。该模型采用MoT架构（38B-Active 3B MoE），在生成信息图、海报、漫画等结构复杂的密集图文内容时能保持高度一致性。详细的技术报告披露了其包括近无损视觉接口、联合训练策略在内的完整构建方案，为行业提供了前沿参考。

SenseTime: 🔥 New week, New SenseNova-U1 Drop - and this one goes Deep!🔥 📄 The full Technical Report is OUT - the most detailed d...

图像生成多模态开源生态模型发布

11:05

Berryxia.AI@berryxia

73

Google DeepMind 发布 Gemini 3.5 Flash：性能大幅提升，但成本显著增加

Google DeepMind 最新发布的 Gemini 3.5 Flash 模型在性能与速度的平衡上取得突破。其智能指数得分为 55，较上一代大幅提升，超越了 Grok 4.3 和 Claude Sonnet 4.6。模型在智能体任务和降低幻觉率方面进步显著，输出速度超过 280 tokens/s。然而，其 API 定价相比前代模型上涨约 3 倍，运行基准测试的成本更是达到 5.5 倍。这意味着 Gemini 3.5 Flash 在实现“更快更智能”的同时，也显著改变了 Flash 系列以往低成本的市场定位。

Artificial Analysis: Google's new Gemini 3.5 Flash is the clear leader on the Intelligence vs Speed Pareto frontier and makes large gains on ...

智能体DeepMind多模态模型发布

10:04

Rohan Paul@rohanpaul_ai

74

谷歌发布Gemini 3.5 Flash：速度更快、性价比更高的代理模型

谷歌推出Gemini 3.5 Flash模型，其输出速度提升至四倍，在终端基准测试等多项高难度任务中超越了Gemini 3.1 Pro。该模型以高速度与低成本，成为适合日常工作的强大代理工具。它已登陆Gemini应用、搜索AI模式及企业级产品等平台。配合升级的Antigravity工具，Gemini 3.5 Flash可驱动协作子代理，大规模并行处理代码审查、重写与测试等复杂任务，实现高效自动化工作流。

Rohan Paul: Gemini 3.5 in few more hours. 🔥

智能体Google推理模型发布

09:14

meng shao@shao__meng

64

Gemini Omni 来了！Google 的优势，果然还是在多模态模型吧？！

Google发布了原生多模态模型Gemini Omni。与传统模型需逐帧描述不同，它采用底层原生设计，支持以意图驱动生成视频，并能通过多轮对话进行编辑，每一步都基于上一结果，确保一致性。该模型融合了Gemini的世界知识与物理直觉，并能将图、文、音视频等任意参考物组合，实现跨模态叙事生成。其目标是“从任何东西创造任何东西”，并从视频生成起步。

Google DeepMind: We're dropping Gemini Omni: our first step towards a model that can create anything from anything - starting with video....

DeepMindGoogle多模态模型发布

09:08

Demis Hassabis@demishassabis

81

Gemini 3.5 Flash 太棒了！ - 在编码和智能体任务上表现优于 3.1 Pro - 速度比其他前沿模型快 4 倍 - 在 @antigravity 中快 12 倍 - 达到 800 tokens/sec！ - 成本通常不到一半 Pro 版本即将到来… 在 @antigravity、@GeminiApp 等平台体验吧 - 享受吧！

智能体Google模型发布编码

关联讨论 12 条

09:04

Rohan Paul@rohanpaul_ai

69

Google Gemini 3.5 Flash 是其类别中非常强大的模型。在众多基准测试中超越了 Gemini 3.1 Pro。这是一个代理模型，每秒处理的令牌速度快4倍。 @aimlapi 刚刚将 Gemini 3.5 Flash 添加到其 API 中，并且在24小时内免费提供。设置说明见评论。

AI/ML API: .@Google : "We're releasing Gemini 3.5 Flash" Us: *We're offering it for free* free for 24hrs via our API find instructi...

Google多模态模型发布

08:05

Berryxia.AI@berryxia

71

Google DeepMind发布Gemini Omni，迈向"生成万物"愿景

Google DeepMind在I/O大会上发布了Gemini Omni模型，旨在迈出“从任何东西生成任何东西”的第一步。该模型将Gemini的智能与生成媒体系统深度融合，在世界理解、多模态和编辑能力上实现飞跃。其核心特点在于生成的视频能保持角色、光影等逻辑一致性，并支持通过自然语言进行实时编辑和风格调整，将视频转变为可动态演进的“世界素材”。该模型目前已在部分应用上线并即将开放API，不过其实际效果，尤其是在中文生成方面，仍存在一些讨论。

Google DeepMind: We're dropping Gemini Omni: our first step towards a model that can create anything from anything - starting with video....

Google多模态模型发布视频

06:42

karminski-牙医@karminski3

61

Gemini-3.5-flash发布！价格直接翻三倍？

在Google I/O大会上，Gemini-3.5-flash模型正式发布，其定价从上一代的$0.5/$3大幅上涨至$1.5/$9。实测显示，其性能介于Gemini-3.0-Pro与Gemini-3.1-Pro之间，但稳定性有所不足。此举被视为Google借鉴Anthropic的产品分层策略，计划用flash-lite、新flash和Pro形成梯队。其中新flash主打在百万级上下文内不设阶梯定价，以承接Pro模型溢出的用户。此次调价也可能旨在配合新发布的Antigravity CLI工具，定位类似Claude Code中的Sonnet模型，从而构建其开发生态。

Google推理模型发布编码

06:36

Orange AI@oran_ge

77

Gemini flash 3.5 昨晚发布，现已可用。 - 模型效果大幅超越 3.1 Pro，指标和 gpt 5.5 接近，比 gpt5.5 好的是 Agentic 和多模态。 - 价格只要 gpt5.5 的三分之一，缓存价格只要六分之一。 - API 定价 $1.50 / $9.00 per 1M token（输入/输出），缓存输入 $0.15。上下文窗口 1M token。 - 速度极快，是其他旗舰模型的4倍，非常适合 Agent 使用。官方介绍地址： https：//blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-5/

智能体Google多模态模型发布

关联讨论 12 条

06:03

Rohan Paul@rohanpaul_ai

67

谷歌发布Gemini Omni全能AI模型，支持多模态输入与精准视频编辑

谷歌近日推出Gemini Omni，这是一款能处理视频、图像、音频、文本及草图等多种输入的全能型视频AI模型。用户可通过自然语言指令对已有视频进行添加角色、替换物体、调整动作、改变风格、同步音效及移动镜头等操作，且多次编辑后仍能保持场景一致性。该模型具备更强的世界理解能力，能更真实地模拟重力、流体等物理交互，使视频编辑更接近导演创作。输出内容将附带SynthID水印与C2PA内容凭证，以明确标识其AI生成属性。

Google模型发布视频

06:03

Jeff Dean@JeffDean

81

谷歌发布了新模型Gemini 3.5 Flash，该模型强调速度与性能的出色结合。与Gemini 3.1 Pro相比，3.5 Flash在几乎所有基准测试上表现更优，尤其在编程能力方面有巨大进步。其核心优势在于极快的推理速度，比其他前沿模型快4倍。在智能水平与输出速度的坐标图中，该模型凭借其卓越表现独占右上角优势区域，代表了速度与能力的新标杆。

Sundar Pichai: Just off stage at #GoogleIO, some highlights from this morning 🧵 Gemini 3.5 Flash is available today for everyone in @a...

Google模型发布编码

关联讨论 12 条

04:38

Demis Hassabis@demishassabis

79

Gemini Omni在世界理解与多模态编辑方面实现了重大飞跃！它能处理照片、视频和音频，并构建全新的场景。随着时间的推移，它将能够处理任何输入和任何输出--从视频开始。你甚至可以提供自己的视频，并在此基础上迭代你的想法：

Google多模态模型发布视频

关联讨论 8 条

03:40

Google AI@GoogleAI

74

到现在，你可能已经听说了 Gemini Omni，这是我们新推出的模型，旨在从任意输入（从视频开始）创造任何内容。但……这有什么大不了的？让我们来分解一下 🧵👇

Google多模态模型发布视频

关联讨论 8 条

03:29

Sundar Pichai@sundarpichai

79

Gemini Omni不仅能构建逼真的场景，还能推理接下来应该发生什么。它结合了对物理的直观理解与Gemini在历史、科学和文化背景方面的知识。今日起，通过@Geminiapp + Google Flow和@YouTube Shorts，向全球Google AI Plus、Pro和Ultra订阅用户推出视频生成功能。

Google多模态模型发布视频

关联讨论 8 条

03:08

Google Gemini@GeminiApp

81

介绍Gemini Omni，我们的新模型，可以从任何输入创建任何内容，首先从视频开始。借助Gemini Omni，您可以将图像、视频和文本作为输入组合，并生成基于Gemini现实世界知识的高质量视频。#GoogleIO

Google多模态模型发布视频

关联讨论 8 条

03:03

OpenRouter@OpenRouter

82

来自@GoogleDeepMind的Gemini 3.5 Flash现已登陆OpenRouter！在编码、智能体任务和工具使用方面超越Gemini 3.1 Pro，同时保持Flash级别的价格和速度。支持100万上下文、6.5万最大输出、多模态。输入$1.50/百万token，输出$9/百万token。

Google多模态模型发布

关联讨论 12 条

02:55

AYi@AYi_AInotes

80

Google Gemini Omni重新定义视频生成

Google推出Gemini Omni，首个面向消费者的世界模型。它通过自然语言交互，将Gemini的智能与生成媒体系统结合，实现了对物理规律、历史、生物等世界的深刻理解。用户可以像编辑ChatGPT文本一样用单句指令编辑视频，实现人物一致性、风格迁移、角度调整等功能。它不是单纯生成像素，而是模拟连贯的物理与语义世界，标志着AI视频生成从拼接工具向智能创作系统的飞跃。

Google DeepMind: We're dropping Gemini Omni: our first step towards a model that can create anything from anything - starting with video....

DeepMindGoogle图像生成多模态

关联讨论 8 条

02:30

Chubby♨️@kimmonismus

81

真正的"哇"时刻是 Gemini Omni。一个迈向 AGI 的世界模型。它可以从任何输入创建任何内容。这太疯狂了。

Logan Kilpatrick: Introducing Gemini Omni 🔮........ Omni is our new model that can create anything from any input - starting with video (...

Google多模态模型发布视频

关联讨论 1 条

02:29

Google AI Developers@googleaidevs

84

✨ 推出 Gemini 3.5，这是我们最新的模型家族，将前沿智能与行动能力相结合。该系列为智能体模型树立了新标准，它们不仅能推理，更能执行。

智能体Google推理模型发布

关联讨论 12 条