小米用单节点8卡标准GPU在1T MoE模型上跑出1000+ tokens/s,没有走晶圆级或专用芯片的路子,直接把推理成本门槛拉低了一大截,做实时对话和Agent的可以申请免费聊天先上手感受一下。
小米 MiMo 联合 TileRT_AI 发布 MiMo-V2.5-Pro-UltraSpeed,首次在 1 万亿参数 MoE 模型上实现超过 1,000 tokens/s 输出速度,仅用单台标准 8-GPGPU 节点(非 Cerebras 或 Groq 方案)。提供限时免费聊天体验,UltraSpeed API 价格为 3 倍,输出体验提升约 10 倍。申请时间为 6 月 8 日至 23 日(PDT),企业可邮件联系 business-mimo@xiaomi.com。
🚀 1,000+ TOKENS/S ON A 1T MODEL! 🚀
We are thrilled to release Xiaomi MiMo-V2.5-Pro-UltraSpeed in collaboration with @TileRT_AI , breaking the 1,000 tokens/s output speed on a 1 Trillion parameter model for the FIRST TIME!
Not wafer-scale integration like Cerebras. Not pure on-chip SRAM chips like Groq. We achieve 1,000 tps on a 1T MoE model using just a SINGLE, STANDARD 8-GPGPU NODE.
Read the full technical deep dive:https://mimo.xiaomi.com/blog/mimo-tilert-1000tps
Want to experience the future of real-time AI? 👉 Apply for UltraSpeed now: https://platform.xiaomimimo.com/ultraspeed ⏳ Limited-Time Access: Application-based · Jun 8 - Jun 23 (PDT) 💬 Chat Experience: Completely FREE for a limited time - try the blazing-fast web chat now. ⚡ UltraSpeed API: Just 3x the price for a ~10x boost in output experience. 🤝 Enterprise & Large-Scale Needs: business-mimo@xiaomi.com