Berryxia.AI@berryxia · 6月3日64微软的新模型MAI-Image-2.5 在图像编辑中斩获第二名的位置。
那么可以看出来还是GPT-Image-2 最强,第一!
Google 的Nano Banana 模型都已经被微软的MAI超越了……
Google 老大哥能不能整点新活儿出来啊,Pro会员都要到期了…
译微软发布新模型MAI-Image-2.5,并在Image Edit Arena(单图编辑)评测中取得第二名,得分为1401。根据评测数据,该模型分数比Nano Banana 2、Grok Imagine Image Quality和ChatGPT-Image-Latest-High Fidelity高出10分。尽管取得了进步,但评测显示当前的第一名仍是GPT-Image-2模型。该消息来源于X用户@berryxia。
meng shao@shao__meng · 6月3日72Microsoft Build 一口气发布了 7 个模型!
微软,最后再信你一次 (1)(1)(1)(1)(1)(1)(1) 😄
译微软Build大会一口气发布了7个模型!
微软,最后再信你一次 (1)(1)(1)(1)(1)(1)(1) 😄
MiniMax (official)@MiniMax_AI · 6月3日74We wrapped a live session on M3 yesterday with the @togethercompute team & our researchers @zpysky1125 and @HaohaiSun
A few highlights 🧵
1. MSA (MiniMax Sparse Attention) is the star ⭐️. Unlike CSA/HCA, which compress the KV cache, MSA keeps the real, uncompressed KV and does block-level selection with a small top-K. That's how the 1M context window stays tractable.
2. The efficiency win is huge. In our previous generation, ~30% of per-decode wall-clock time went to the attention kernel. With MSA that now drops to ~5%. Big gains for long-context generation.
3. M3 isn't just a coding model. Natively multimodal (image + video in), ability to handle long-horizon agentic tasks, and even operate a desktop computer. People are already throwing game-dev + Minecraft-style builds at it (Unity included) and it's holding its own.
4. M3 can self-evaluate on vision-coding tasks: it builds a website or SVG, browses and inspects its own rendered output, judges it, and iterates - grading work visually.
5. We're also seeing junior-analyst-level performance on finance tasks; something we haven't even showcased publicly yet.
6. What's next: harder long-horizon / multi-file tasks in future releases, scaling data + post-training (RL) compute toward pre-training scale, and going deeper into finance, legal & bio.
Thanks to everyone who joined 🙏
Try M3 link in the comments👇
译MiniMax M3模型通过Live Session分享了核心信息。其MSA技术采用块级Top-K选择,保持真实、未压缩的KV缓存,使1M token上下文窗口高效运行。该技术将长上下文生成的注意力内核解码时间从约30%降至约5%,效率提升显著。M3是原生多模态模型,支持图像视频输入,可处理长程智能体任务及桌面操作,并具备视觉自评估迭代能力。模型在金融任务中展现出初级分析师水平。未来版本将聚焦更复杂的长程任务,并扩展金融、法律与生物领域。Together AI为其提供推理服务。
MiniMax (official)@MiniMax_AI · 6月3日80MiniMax-M3 #6 overall on @ValsAI
the new open-weight SOTA 🚀
译MiniMax-M3 在 @ValsAI 排名中位列第六
新的开源权重 SOTA 🚀
Rohan Paul@rohanpaul_ai · 6月3日81Microsoft unveiled MAI-Thinking-1.
So Microsoft now has a full in-house pipeline for building stronger reasoning models again and again.
Microsoft calls this system a “hill-climbing machine,” meaning it keeps improving the data, training setup, rewards, safety tests, and evaluations as one connected process.
Strong for its size, including 97.0% on AIME 2025, 87.7% on LiveCodeBench v6, and 52.8% on SWE-Bench Pro.
MAI-Thinking-1 is the first model from that process, using 35B active parameters inside a 1T total parameter mixture-of-experts model, where only part of the model runs for each token.
The base model was trained from scratch on 30T mostly human-generated tokens, with Microsoft saying it avoided third-party model distillation during pre-training.
After that, the team used reinforcement learning, which means the model practiced tasks and improved from feedback, to teach math reasoning, coding, tool use, helpfulness, and safety.
译微软发布了 MAI-Thinking-1,这是一款采用 MoE 架构的模型,拥有 35B 活跃参数和 1T 总参数。该模型从零开始在 30T tokens 上完成预训练,且未使用第三方模型蒸馏。微软称其迭代优化流程为“爬山机器”。在基准测试中,该模型于 AIME 2025 获得 97.0%,在 LiveCodeBench v6 获得 87.7%,在 SWE-Bench Pro 获得 52.8% 的成绩。
Chubby♨️@kimmonismus · 6月3日63Mai-1 thinking: Mid size model, 45b active parameter, MoE, side by side with sonnet 4.6
0 distillation
„Microsoft’s first reasoning model“
译Mai-1 thinking:中型模型,45b 活跃参数,MoE,与 Sonnet 4.6 并列
0 知识蒸馏
“微软的首个推理模型”
Artificial Analysis@ArtificialAnlys · 6月3日64Microsoft has released MAI-Transcribe-1.5: an exceptionally fast speech transcription model at a speed factor of ~276x, while still achieving 2.4% on AA-WER (#3), leading the accuracy-speed Pareto frontier
MAI-Transcribe-1.5 is Microsoft AI (MAI)’s latest speech transcription model, coming in at 3rd overall on the on the Artificial Analysis Word Error Rate (AA-WER) leaderboard, behind Alibaba’s Fun-Realtime-ASR-preview (1.7% WER), and ElevenLabs Scribe v2 (2.2% WER). The model stands out as the fastest STT model in the top 10 for accuracy, processing audio at ~276x real-time - this is more than double the speed of the second fastest model in the top 10 for accuracy.
The new model supports keyword biasing (improved recognition of rarer vocabulary such as names and medical terminology), in addition to support for 43 languages including English, French, Arabic, Japanese, and Chinese.
See more details below ⬇️
译微软AI发布了MAI-Transcribe-1.5语音转录模型。该模型在AA-WER排行榜上位列第三,词错误率(WER)为2.4%,仅次于阿里巴巴的Fun-Realtime-ASR-preview(1.7%)和ElevenLabs Scribe v2(2.2%)。其主要特点是速度极快,处理速度约为276倍实时,是准确率前十模型中第二快模型速度的两倍以上,因此在准确率-速度帕累托前沿上处于领先地位。模型还支持关键词偏差识别,并涵盖包括英语、法语、阿拉伯语、日语和中文在内的43种语言。
🚨 AI News | TestingCatalog@testingcatalog · 6月3日70MICROSOFT 🔥: New MAI Code 1 Flash and MAI Thinking 1 models have been revealed on the official MAI website!
Also, MAI Image 2.5, MAI Voice 2, and MAI Transcribe 1.5 are there too.
> MAI-Code-1-Flash plans and reasons through complex coding tasks from start to finish, so you spend less time debugging and more time building.
> MAI-Thinking-1 (35B active, ~1T total parameters, MoE) has a smaller inference footprint than much larger models, yet is competitive with Claude Opus 4.6 on SWE-Bench Pro.
h/t @MeetPatelTech
译微软在官网更新了 MAI 模型系列,重点发布了 MAI Code 1 Flash 和 MAI Thinking 1。MAI Thinking 1 拥有 35B 活跃参数和约 1T 总参数,采用 MoE 架构,其推理成本低于更大型模型,但在 SWE-Bench Pro 上的表现可与 Claude Opus 4.6 竞争。MAI Code 1 Flash 则专注于通过规划和推理来完成端到端的复杂编码任务。此外,MAI Image 2.5、MAI Voice 2 及 MAI Transcribe 1.5 也同步上线。
Artificial Analysis@ArtificialAnlys · 6月3日62Krea 2 Medium debuts at #6 on the Artificial Analysis Text to Image Leaderboard, trailing only models from OpenAI, Google, and NVIDIA!
Krea 2 is @krea_ai's first image model family trained entirely from scratch (Krea 1 was developed in collaboration with Black Forest Labs). Krea 2 is available in two variants: Krea 2 Medium, and Krea 2 Large, which is more comparable to FLUX.2 [pro] in our arena.
Notably, Krea 2 Medium outranks the larger, more expensive Krea 2 Large in our arena. Krea describes Medium as smaller and faster, with extensive post-training that makes its outputs especially stable and consistent across generations. While Large is positioned as the more capable model, our leaderboard results align with Krea's view that Medium "handles the broadest range of use cases reliably."
Both models generate at 1K resolution and share a distinct set of generation controls via the API:
➤ Style transfer: Krea can extract the style of up to 10 reference images, with each image being able to be weighted in terms of importance
➤ Creativity Setting: A configurable API parameter (raw, low, medium, high) that sets how closely the model follows the prompt versus reinterpreting it
➤ Moodboards: A collection of images that can be collected in the application to apply a style transfer onto the image (separate from individual style reference images)
At $30 per 1k images via Krea's API, Krea 2 Medium is priced below comparable models such as Nano Banana Pro at $134/1k images or grok-imagine-image-quality at $50/1k images. Krea 2 Large is priced at $60 per 1k images, and both models' prices increase with the use of the Style Transfer and Moodboard features.
Both models are available in the Krea app, via Krea's API, and on official third-party launch partners. Congratulations to @krea_ai on the launch!
See below for comparisons between Krea 2 and other leading models in our Artificial Analysis Image Arena 🧵
译Krea AI自研的文生图模型Krea 2 Medium在Artificial Analysis排行榜上位列第6,仅落后于OpenAI、Google和NVIDIA的模型。值得注意的是,体积更小、速度更快的Medium版本在排名上超过了定位更强大的Large版本。两款模型均支持通过API进行风格迁移和创意控制等操作,生成1K分辨率图像。定价方面,Krea 2 Medium为30美元/千张,Krea 2 Large为60美元/千张。
StepFun@StepFun_ai · 6月2日73Open weights are moving from model cards into real coding workflows.
Step 3.7 Flash is designed for fast agentic coding, reliable tool calling, and multimodal understanding.
Big thanks for the blog from the @kilocode team:
https://blog.kilo.ai/p/new-models-from-stepfun-and-minimax
译阶跃星辰发布 Step 3.7 Flash 模型,强调其为快速智能体编程设计,具备可靠的工具调用与多模态理解能力。该模型采用开放权重。同期,MiniMax 也开源了 M3 模型。两者已均在 Kilo 中上线。此次发布凸显了开放权重模型正从模型卡片走向实际编程工作流的趋势。
MiniMax (official)@MiniMax_AI · 6月2日72Watch M3 reach the frontier 🚀
译MiniMax发布M3模型,宣称是首个将编程与智能体能力、1M上下文长度及原生多模态三大前沿能力结合的开源权重模型。其编程与智能体能力在多个评测中表现突出:SWE-Bench Pro得分59.0%,Terminal Bench 2.1得分66.0%,SWE-fficiency 34.8%,KernelBench Hard 28.8%,MCP Atlas 74.2%。模型通过MiniMax Sparse Attention技术支持1M上下文。官方提供了API接入与新的MiniMax Code服务,模型权重和技术报告预计约10天后发布。
StepFun@StepFun_ai · 6月2日74We probably don’t talk enough about “usable.”
译我们可能对“可用性”的讨论还不够。
当Flash模型同时将速度、成本和智能带入“可用”范围时,智能的供给方式发生了结构性变化。
SenseTime@SenseTime_AI · 6月2日73Thanks for using our model to create these complex charts and diagrams.
It's great to see challenging information transformed into clear, accurate, and readable visuals. That's what we aim for. 😄
译感谢使用我们的模型来创建这些复杂的图表和图表。
看到具有挑战性的信息被转化为清晰、准确和可读的视觉效果真是太棒了。这就是我们的目标。😄
SenseTime@SenseTime_AI · 6月2日71Turning complex information into accurate charts and diagrams. That's 𝗦𝗲𝗻𝘀𝗲𝗡𝗼𝘃𝗮‐𝗨𝟭‐𝟴𝗕‐𝗠𝗼𝗧‐𝗜𝗻𝗳𝗼𝗴𝗿𝗮𝗽𝗵𝗶𝗰. Learn more: https://x.com/SenseTime_AI/status/2061465029959209106?s=20
译将复杂信息转化为准确的图表和示意图。这就是 𝗦𝗲𝗻𝘀𝗲𝗡𝗼𝘃𝗮‐𝗨𝟭‐𝟴𝗕‐𝗠𝗼𝗧‐𝗜𝗻𝗳𝗼𝗴𝗿𝗮𝗽𝗵𝗶𝗰。了解更多:https://x.com/SenseTime_AI/status/2061465029959209106?s=20
StepFun@StepFun_ai · 6月2日69This is exactly the philosophy: don't bolt on efficiency, design for it from day one.
MFA + AFD aren't tricks. They're what lets Step 3.7 Flash serve at a fraction of the KV-cache cost.
Huge thanks to @FireworksAI_HQ for making Step 3.7 Flash one-click to run.
Go build something agentic with it.
译阶跃星辰发布其推理优化型模型Step 3.7 Flash。该模型为196B MoE架构,从设计之初就专注于推理效率。其采用多矩阵分解注意力机制,使KV-cache成本仅为DeepSeek模型的约22%;同时通过注意力与FFN解耦技术,实现了硬件优化的高效服务。该模型已通过Fireworks AI提供,采用Apache 2.0许可,并可用于构建智能体应用。
MiniMax (official)@MiniMax_AI · 6月2日78Watch open source reach the frontier. 🚀
译MiniMax宣布推出首个开源权重模型M3。该模型结合了三大前沿能力:在编程与智能体方面,它在SWE-Bench Pro等评测上取得了具体分数;通过MiniMax Sparse Attention技术,其上下文窗口可扩展至1M tokens;并且模型从零开始原生支持多模态。模型的权重与技术报告将在约10天后发布。
Alibaba Cloud@alibaba_cloud · 6月2日82👏👏 Introducing Qwen3.7-Plus — a multimodal agent model that unifies vision and language into one versatile agent foundation.
✅ Multimodal interactive hybrid agent: unified GUI & CLI operation across visual and text tasks
✅ Versatile coding agent & productivity assistant with full-modality input
✅ Visual Agent: perception, reasoning, grounding, and search-augmented QA
✅ Cross-harness generalization across diverse agent frameworks
One model. Sees, thinks, codes, acts.🙌🙌
Now available via API on Alibaba Cloud Model Studio. Try it — let us know what you build.😎
🔗🔗⬇️⬇️
Blog:https://qwen.ai/blog?id=qwen3.7-plus
Qwen Studio:https://int.alibabacloud.com/m/1000413837/
API:https://int.alibabacloud.com/m/1000413829/
译阿里云推出Qwen3.7-Plus,这是一个统一视觉与语言的多模态智能体模型。其定位为多功能编码智能体与生产力助手,支持全模态输入,能够跨GUI与CLI执行任务。该模型具备视觉智能体能力,涵盖感知、推理、定位及搜索增强问答,并能跨多种智能体框架泛化。目前已在阿里云百炼平台通过API上线。
MiniMax (official)@MiniMax_AI · 6月2日74🚀 M3 is live on Vercel's AI Gateway!
Our first long-context model with 1M tokens, multimodal input.
AND 50% off for the week 🎉
Love to see what everyone builds with M3 and @vercel_dev ✨
译🚀 M3 已在 Vercel 的 AI Gateway 上线!
我们首个支持 1M token 长上下文和多模态输入的模型。
本周享 50% 折扣 🎉
期待看到大家用 M3 和 @vercel_dev 构建什么 ✨
ginobefun@hongming731 · 6月2日71#BestBlogs 早报 06-02
MiniMax 发布了国内首个集前沿 Coding、1M 超长上下文、原生多模态于一体的开源模型 M3,24 小时自主完成 145 次 CUDA 算子迭代,把抽象的 benchmark 变成了可验证的工程实力。
与此同时,xAI 前负责人给出一个反直觉判断:视频模型的上限跟着 LLM 走,下一个 Sora 是视频 Agent 而非更好的视频模型。
今日 BestBlogs 早报,还有 Chromium 3500 万行代码库的 AI Coding 规范体系、语音智能体生产工程实践、「RAG 不是机器学习」等 10 篇精选,欢迎阅读。
译MiniMax开源发布了国内首个集成前沿Coding能力、1M超长上下文和原生多模态的模型M3。该模型能在24小时内自主完成145次CUDA算子迭代。与此同时,xAI前负责人指出,视频模型的上限将由LLM决定,下一个类似Sora的产品应是视频Agent,而非单纯的视频生成模型。
Alibaba Cloud@alibaba_cloud · 6月2日83👏👏 Introducing Qwen3.7-Plus — a multimodal agent model that unifies vision and language into one versatile agent foundation.
✅ Multimodal interactive hybrid agent: unified GUI & CLI operation across visual and text tasks
✅ Versatile coding agent & productivity assistant with full-modality input
✅ Visual Agent: perception, reasoning, grounding, and search-augmented QA
✅ Cross-harness generalization across diverse agent frameworks
One model. Sees, thinks, codes, acts.🙌🙌
Now available via API on Alibaba Cloud Model Studio. Try it — let us know what you build.😎
🔗🔗⬇️⬇️
Blog:https://qwen.ai/blog?id=qwen3.7-plus
Qwen Studio:https://chat.qwen.ai/?models=qwen3.7-plus
API:https://modelstudio.console.alibabacloud.com/ap-southeast-1?tab=doc#/doc/?type=model&url=2840914_2&modelId=qwen3.7-plus&serviceSite=international
译阿里云发布了 Qwen3.7-Plus,这是一款统一了视觉与语言能力的多模态代理模型。该模型旨在成为通用的代理基础,支持图形界面与命令行操作,能够处理视觉和文本任务,充当编程代理和效率助手。其能力涵盖视觉感知、推理、目标定位以及搜索增强问答,并可跨多种代理框架进行泛化。该模型现已在阿里云百炼平台提供 API 服务。
MiniMax (official)@MiniMax_AI · 6月2日81M3 on Cloudflare AI Gateway, day one ⚡
Frontier coding, 1M context, and native multimodal and now just one fetch away.
It is time to build something. 🦞
译M3 on Cloudflare AI Gateway, day one ⚡
前沿编码能力,1M 上下文,原生多模态,现在一次 fetch 即可调用。
是时候构建些东西了。 🦞
Chubby♨️@kimmonismus · 6月2日79Qwen3.7 plus released. Looks good, but why do they compare their models to GPT-5.4 and Opus 4.6?
Anyways, multimodal as well
译阿里云通义千问(Qwen3.7-Plus)正式发布。这是一个统一视觉与语言的多模态智能体基础模型,其核心功能包括:支持GUI与CLI操作的交互式混合智能体、全能编码助手与生产力工具、具备感知、推理、定位及搜索增强能力的视觉智能体,并可跨主流智能体框架泛化。该模型现已通过阿里云模型工作室提供API。发布推文中提到的与GPT-5.4及Opus 4.6的比较,在用户侧引发了对其对标产品的讨论。
MiniMax (official)@MiniMax_AI · 6月2日55napkin sketch → playable game for $0.028 😳
this is the kind of thing M3 was built for @atomic_chat_hq
译草图 → 可玩游戏,仅花 $0.028 😳
这正是 M3 的设计初衷 @atomic_chat_hq
xAI@xai · 6月2日67Composer 2.5 is now available inside Grok Build.
Composer 2.5 is a fast, highly intelligent model that excels on long-running tasks and following complex instructions.
译Composer 2.5 现已在 Grok Build 中可用。
Composer 2.5 是一个快速、高度智能的模型,擅长处理长时间运行的任务和遵循复杂指令。
MiniMax (official)@MiniMax_AI · 6月2日69messy, multimodal, too large for a normal chat? M3 handles it 🫡 @happycapyai
译MiniMax M3现已在Happycapy上线,主要升级在于处理复杂、多模态、大规模任务的能力。该模型支持原生多模态输入,包括PDF、视频、图像、截图及长文档,并在编程和智能体任务(如仓库级调试、问题追踪)上表现较强。此外,M3采用开源权重,价格约为Sonnet的三分之一。
Qwen@Alibaba_Qwen · 6月2日83👏👏 Introducing Qwen3.7-Plus — a multimodal agent model that unifies vision and language into one versatile agent foundation.
✅ Multimodal interactive hybrid agent: unified GUI & CLI operation across visual and text tasks
✅ Versatile coding agent & productivity assistant with full-modality input
✅ Visual Agent: perception, reasoning, grounding, and search-augmented QA
✅ Cross-harness generalization across diverse agent frameworks
One model. Sees, thinks, codes, acts.🙌🙌
Now available via API on Alibaba Cloud Model Studio. Try it — let us know what you build.😎
🔗🔗⬇️⬇️
Blog:https://qwen.ai/blog?id=qwen3.7-plus
Qwen Studio:https://chat.qwen.ai/?models=qwen3.7-plus
API:https://modelstudio.console.alibabacloud.com/ap-southeast-1?tab=doc#/doc/?type=model&url=2840914_2&modelId=qwen3.7-plus&serviceSite=international
译通义千问推出 Qwen3.7-Plus,这是一款统一视觉与语言能力的多模态智能体模型。它支持图形界面与命令行混合操作,可作为多功能编码智能体与生产力助手,并具备视觉感知、推理、定位与搜索增强问答能力。该模型设计为可跨多种智能体框架泛化。现在可通过阿里云百炼平台的 API 使用。
MiniMax (official)@MiniMax_AI · 6月2日5426% improvement on BU Bench 👀 more to come
译BU Bench上提升26% 👀 还有更多
MiniMax (official)@MiniMax_AI · 6月2日78this is what model-and-agent alignment looks like 🤝 @SimularAI
译这就是模型与智能体对齐的样子 🤝 @SimularAI
MiniMax (official)@MiniMax_AI · 6月2日76day 0 launch partner energy 🔥 @Qubrid_AI is offering 50% off for early adopters. go run it!
译MiniMax的M3模型现已在Qubrid AI平台上线。该模型具备100万token上下文、原生多模态、前沿的代码性能,并支持长期智能体工作流,被评为年度技术上最有趣的开放权重模型之一。Qubrid AI作为首发合作伙伴,为早期用户提供50%的折扣。
Artificial Analysis@ArtificialAnlys · 6月2日77NVIDIA's Cosmos 3 lands at #1 among open weights models in both Text to Image and Image to Video on the Artificial Analysis Leaderboards!
Cosmos 3 is a family of omnimodal world models for Physical AI from @nvidia, unifying language, image, video, audio and action in a single Mixture-of-Transformers architecture that pairs an autoregressive reasoner with a diffusion generator.
The family comes in four variants: base Nano (16B: 8B reasoner tower + 8B generator tower) and Super (64B: 32B reasoner tower + 32B generator tower) models, with the Super model also having Text2Image and Image2Video fine-tuned variants, which are the versions listed in the Artificial Analysis Arena Leaderboards.
Cosmos3-Super-Text2Image (agentic) runs through an agentic prompt-upsampling harness, and takes the #1 open weights spot in Text to Image, surpassing HiDream-O1-Image-Dev-2604, Alibaba's Qwen Image Max 2512 and Black Forest Labs' FLUX.2 [dev].
Cosmos3-Super-Image2Video takes #1 open weights in Image to Video (No Audio), ahead of Lightricks' LTX-2, and Alibaba's Wan 2.2 A14B.
Cosmos 3 generators take structured JSON prompts rather than plain text, so prompt upsampling is needed to reproduce these results. This upsampling can be handled by an external harness or by the model's own reasoner branch, so it can also run self-contained.
Cosmos 3 is fully open under the OpenMDW 1.1 license, shipping with weights, code, curated datasets and fine-tuning recipes available on @huggingface. First-party and third-party APIs are expected over the next few weeks, with pricing to follow.
See the thread below for example generations and a link to try Cosmos 3 in our arena 🧵
译NVIDIA 的 Cosmos 3 全模态世界模型在 Artificial Analysis 排行榜的开放权重类别中,同时夺得文本生成图像和图像生成视频两项第一。该模型基于 Mixture-of-Transformers 架构,结合自回归推理器与扩散生成器,提供 16B 参数的 Nano 和 64B 参数的 Super 等变体。其中,Cosmos3-Super-Text2Image 与 Cosmos3-Super-Image2Video 版本分别超越了 HiDream-O1-Image-Dev-2604、通义千问(Qwen)Image Max 2512、FLUX.2 [dev] 以及 LTX-2、万相(Wan)2.2 A14B 等模型。Cosmos 3 的生成器接受结构化 JSON 提示词,可通过外部工具或模型自身的推理器分支进行提示词上采样。该模型完全开源,采用 OpenMDW 1.1 许可,提供权重、代码、精选数据集和微调方案。
Chubby♨️@kimmonismus · 6月2日82MiniMax just dropped M3! It hits 59% on SWE-Bench Pro, edging out GPT-5.5 (58.6%) and beating Gemini 3.1 Pro (54.2%).
Trails Opus 4.7 on coding, but leads it on autonomous browsing at 83.5% on BrowseComp. First open model to pack frontier coding, a 1M-token context, and native multimodality into one system.
I mean, let that sink in: Roughly 12x cheaper per token than GPT-5.5, with weights and a full tech report promised in about 10 days.
译MiniMax发布开源模型M3,它是首个将前沿编码能力、1M token上下文窗口与原生多模态集成于单一系统的开源模型。M3在SWE-Bench Pro上得分为59.0%,略高于GPT-5.5(58.6%)与Gemini 3.1 Pro(54.2%);在BrowseComp自主浏览任务中以83.5%领先Opus 4.7。此外,模型在Terminal Bench 2.1(66.0%)、MCP Atlas(74.2%)等基准上表现优异。其每token成本约为GPT-5.5的十二分之一,模型权重及技术报告预计在10天后发布。
Rohan Paul@rohanpaul_ai · 6月2日74Nemotron 3 Ultra will be available from Nvidia in few days.
Hybrid SSM (state-space models) + mixture-of-experts architecture.
The SSM part is built for long sequences, so the model can keep reasoning or using tools for longer without getting crushed by the usual attention cost.
Jensen Huang at NVIDIA GTC Taipei 2026
----
From 'NVIDIA' YT channel (link in comment)
译Nemotron 3 Ultra将在几天内由Nvidia发布。
采用混合SSM(状态空间模型)+ 混合专家架构。
SSM部分专为长序列设计,因此模型可以更长时间地持续推理或使用工具,而不会被通常的注意力成本压垮。
黄仁勋在NVIDIA GTC台北2026上表示。
----
来自'NVIDIA' YouTube频道(链接在评论中)
🚨 AI News | TestingCatalog@testingcatalog · 6月1日58MiniMax M3 is now live inside Atomic Chat 👀
Atomic tested M3 on a task to read a hand-drawn napkin sketch, write the game logic, build the UI, and ship a playable HTML platformer in one pass.
All this for $0.028 🤖
译MiniMax M3模型现已集成至Atomic Chat。在一项测试中,Atomic Chat使用M3模型读取了一张手绘的涂鸦风格平台跳跃游戏草图,并一次性完成了游戏逻辑编写、界面绘制以及最终交付一个可运行的独立HTML游戏。测试数据显示,该任务消耗输入6,920模型token,生成输出9,933模型token,总成本仅为$0.028。此外,MiniMax计划于下周在HuggingFace发布M3模型。
SenseTime@SenseTime_AI · 6月1日67𝗚𝗲𝘁𝘁𝗶𝗻𝗴 𝗰𝗵𝗮𝗿𝘁𝘀 𝗮𝗻𝗱 𝗱𝗶𝗮𝗴𝗿𝗮𝗺𝘀 𝗿𝗶𝗴𝗵𝘁 𝘄𝗶𝘁𝗵 #𝗔𝗜 📊
Most AI models still struggle with these data visuals — negatives shown as positives, bar positions off, element relationships scrambled.
𝗦𝗲𝗻𝘀𝗲𝗡𝗼𝘃𝗮‐𝗨𝟭‐𝟴𝗕‐𝗠𝗼𝗧‐𝗜𝗻𝗳𝗼𝗴𝗿𝗮𝗽𝗵𝗶𝗰 breaks through that barrier.
Generate accurate visuals, then tweak the design and layout on the fly. See the difference and try it yourself:
See the difference and try it yourself:
🤗 https://huggingface.co/sensenova/SenseNova-U1-8B-MoT-Infographic
🖼️ Showcases: https://github.com/OpenSenseNova/SenseNova-U1/blob/main/docs/u1_infographic_showcases.md
👾 Discord: https://discord.gg/BuTXPHmQub@github
@huggingface @github
译大多数AI模型在生成图表时存在数值错误(如负值显示为正)、柱状图位置偏移、元素关系混乱等问题。SenseNova-U1-8B-MoT-Infographic(SenseNova-U1)专为解决此类图表生成问题而设计,能够生成准确的图表,并支持实时调整设计和布局。项目在Hugging Face提供了模型,并在GitHub展示了效果案例。
Chubby♨️@kimmonismus · 6月1日831/ NVIDIA just open-sourced Cosmos 3 at GTC Taipei!
It's the first fully open "omnimodel" for physical AI - one model that understands the real world, predicts what happens next, and generates the actions a robot should take.
Weights, code, datasets. All open. And this is really big. Lets dig into everything: 🧵
译NVIDIA在GTC Taipei上宣布完全开源Cosmos 3。这是首个针对物理AI的“全能模型”,具备原生视觉推理能力,可理解真实世界、预测未来并生成机器人应采取的行动。本次发布包含两个变体:Super(32B)和Nano(8B)。模型权重、代码及数据集均已完全开放。
SiliconFlow@SiliconFlowAI · 6月1日79Coding like Opus4.7 / 1M context window / Native multimodal
@MiniMax_AI M3 is now on SiliconFlow with day-0 support 🔥
🎉 Limited-time 50% off for 7 days
Cache / Input / Output: $0.06 / $0.30 / $1.20 per 1M tokens
(Regular: $0.12 / $0.60 / $2.40)
M3 is the first open-source model combining all three frontier capabilities:
→ Coding & Agentic: beats GPT-5.5 and Gemini 3.1 Pro on SWE-Bench Pro
→ 1M context via MiniMax Sparse Attention
→ Native multimodal from step zero — image, video & computer use
Try it on SiliconFlow ⬇️
译MiniMax M3 现已在 SiliconFlow 平台上线,并提供限时7天的50%折扣。定价为:缓存 $0.06、输入 $0.30、输出 $1.20(每百万 token)。M3 是首个同时具备三大前沿能力的开源模型:一是编码与智能体能力,在 SWE-Bench Pro 评测中击败了 GPT-5.5 和 Gemini 3.1 Pro;二是支持 100万 token 上下文窗口(通过 MiniMax Sparse Attention 技术实现);三是具备原生多模态能力,支持图像、视频与计算机操作。
MiniMax (official)@MiniMax_AI · 6月1日731. Video control + gaming + M3
2. Open weights + massive context ++ strong coding
3. Canceling my weekend plans now
译1. 视频控制 + 游戏 + M3
2. 开放权重 + 海量上下文 + 强编码能力
3. 现在就取消我的周末计划
[引用 @MinLiBuilds]:跟祖传的 20K context 说 bye bye 了。
MiniMax M3 发布了,三个亮点:
1M context、原生多模态、Agentic。
我这次做了一次完整评测,使用CC workflow 、 @ZenMuxAI和MiniMax M3:
给一张截图,做一个“凡人修仙剑阵对决手势游戏”。
要求是:支持双人对决 、使用 workflow 拆解任务、加入石头剪刀布机制。
2 小时后,游戏真的跑起来了。
这一代LLM的版本答案我知道了:
1M 上下文 + 多模态+ agent 模式。
1M context 是推理深度的基础,多 agent 负责拆任务和执行。
🚨 AI News | TestingCatalog@testingcatalog · 6月1日55NVIDIA announced an upcoming release of Nemotron 3 Ultra later this week, a 550B-parameter open-weight model.
According to Artificial Analysis, it is positioned as the most intelligent open-weight model from the US lab.
Soon 👀
译NVIDIA宣布将于本周晚些时候发布Nemotron 3 Ultra,这是一个550B参数的开放权重模型。
根据Artificial Analysis,它被定位为美国实验室最智能的开放权重模型。
Soon 👀
karminski-牙医@karminski3 · 6月1日79球球你们休息一下,真的测不过来了🥲
译MiniMax 发布新模型 MiniMax M3,声称是首个同时整合三项前沿能力的开源权重模型。这三项能力为:编码与智能体前沿能力,在 SWE-Bench Pro 等基准测试中取得具体分数;MiniMax 稀疏注意力机制将上下文长度扩展至 1M;以及原生多模态能力。模型权重与技术报告预计在约 10 天后发布。
MiniMax (official)@MiniMax_AI · 6月1日64It truly is 😎 #M3
译确实如此 😎 #M3