商汤把原生多模态模型的训练细节全公开了,无视觉编码器、原生MoE架构,还开源了38B-A3B权重,做多模态模型的可以直接照着技术报告复现。
SenseNova-U1 技术报告详尽披露了构建前沿原生多模态模型的方法,核心包括原生多模态统一建模、无损视觉接口、联合自回归与像素空间流匹配训练、以及原生混合专家骨干网络。报告提供了六阶段训练方案、强化学习后训练与蒸馏的完整实践指南。其开源版本 SenseNova-U1-A3B-MoT 基于混合专家架构,仅激活30亿参数,实现了高效快速的性能。相关资源已全面开放,涵盖技术报告、模型权重、代码和演示平台。
🔥 New week, New SenseNova-U1 Drop - and this one goes Deep!🔥
📄 The full Technical Report is OUT - the most detailed disclosure yet of how to build a frontier Native Multimodal Model.
Inside: ✨ Near-lossless visual interface (no VEs, no VAEs) ✨ Native Multimodal Unified Modeling ✨ Joint AR + pixel-space flow matching training ✨ Native Mixture-of-Transformers backbone ✨ 6-stage training recipe + RL post-training + distillation
If you work on NMM, this is the playbook.
🤗 One more thing: SenseNova-U1-A3B-MoT (38B-A3B MoE) weights are now open-sourced - a RARE native unified model on an MoE backbone (Only 3B active! Lightning Fast⚡)
📄 Tech Report: https://arxiv.org/abs/2605.12500 🤗 Daily Papers (Vote & Discuss): https://huggingface.co/papers/2605.12500 🤗 Models: https://huggingface.co/collections/sensenova/sensenova-u1 💻 Code: https://github.com/OpenSenseNova/SenseNova-U1 🎮 Demo: https://unify.light-ai.top 👾 Discord: https://discord.com/invite/BuTXPHmQub