小米 MiMo-V2.5-Pro-UltraSpeed 突破 1,000 tokens/s，单台 8-GPGPU 节点运

Xiaomi MiMo@XiaomiMiMo

精选82

2026-06-08 22:37·7天前

精选理由

小米用单节点8卡标准GPU在1T MoE模型上跑出1000+ tokens/s，没有走晶圆级或专用芯片的路子，直接把推理成本门槛拉低了一大截，做实时对话和Agent的可以申请免费聊天先上手感受一下。

AI 摘要

小米 MiMo 联合 TileRT_AI 发布 MiMo-V2.5-Pro-UltraSpeed，首次在 1 万亿参数 MoE 模型上实现超过 1,000 tokens/s 输出速度，仅用单台标准 8-GPGPU 节点（非 Cerebras 或 Groq 方案）。提供限时免费聊天体验，UltraSpeed API 价格为 3 倍，输出体验提升约 10 倍。申请时间为 6 月 8 日至 23 日（PDT），企业可邮件联系 business-mimo@xiaomi.com。

🚀 1，000+ TOKENS/S ON A 1T MODEL！ 🚀

We are thrilled to release Xiaomi MiMo-V2.5-Pro-UltraSpeed in collaboration with @TileRT_AI ， breaking the 1，000 tokens/s output speed on a 1 Trillion parameter model for the FIRST TIME！

Not wafer-scale integration like Cerebras. Not pure on-chip SRAM chips like Groq. We achieve 1，000 tps on a 1T MoE model using just a SINGLE， STANDARD 8-GPGPU NODE.

Read the full technical deep dive：https：//mimo.xiaomi.com/blog/mimo-tilert-1000tps

Want to experience the future of real-time AI？ 👉 Apply for UltraSpeed now： https：//platform.xiaomimimo.com/ultraspeed ⏳ Limited-Time Access： Application-based · Jun 8 - Jun 23 （PDT） 💬 Chat Experience： Completely FREE for a limited time - try the blazing-fast web chat now. ⚡ UltraSpeed API： Just 3x the price for a ~10x boost in output experience. 🤝 Enterprise & Large-Scale Needs： business-mimo@xiaomi.com

推理模型发布部署/工程

在 X 查看原推

Xiaomi MiMo@XiaomiMiMo · X