商汤科技SenseNova U1已开源发布。其核心创新在于原生多模态统一建模,将视觉、语言与图像生成视为一个统一问题,而非分立模块的链式处理,从而减少了信息损失。该模型采用MoT架构(38B-Active 3B MoE),在生成信息图、海报、漫画等结构复杂的密集图文内容时能保持高度一致性。详细的技术报告披露了其包括近无损视觉接口、联合训练策略在内的完整构建方案,为行业提供了前沿参考。
Chinese AI labs are increasingly releasing very serious open source work.
SenseNova U1 just dropped on HuggingFace: native multimodal modeling, MoT architecture (38B-Active 3B MoE)
It attacks the hardest part of image generation: readable, structured, consistent image-text output.
The most interesting part of SenseNova U1 is it treats multimodal generation as one native modeling problem, not a chain of separate vision, language, and image modules.
That means less handoff between modules, less information loss, and better consistency when creating dense visual content like infographics, guides, posters, comics, and image-text workflows.
ComfyUI support, fast A3B inference, and absolutely brilliant for dense visuals like infographics, posters, comics, and guides.