DECO：面向终端设备的稀疏混合专家模型，实现媲美稠密模型的性能

2026-05-11 08:00·36天前

AI 摘要

为克服混合专家模型参数量大导致的存储与访存瓶颈，研究团队提出稀疏MoE架构DECO，以适配终端设备对高性能、低计算成本和小存储开销的需求。DECO采用基于可学习专家级缩放的ReLU可微分灵活路由，自适应平衡路由专家与共享专家的贡献，并引入NormSiLU激活函数提升路由稳定性与稀疏度。实验表明，在总参数量和训练数据量相同的情况下，DECO仅激活20%的专家即可匹配稠密Transformer性能，且超越现有MoE基线；其专用加速内核在真实硬件上实现了相比稠密推理3.00倍的加速。代码与模型将开源。

原文 · 未翻译

While Mixture-of-Experts (MoE) scales model capacity without proportionally increasing computation, its massive total parameter footprint creates significant storage and memory-access bottlenecks, which hinder efficient end-side deployment that simultaneously requires high performance, low computational cost, and small storage overhead. To achieve these properties, we present DECO, a sparse MoE architecture designed to match the performance of dense Transformers under identical total parameter budgets and training tokens. DECO utilizes the differentiable and flexible ReLU-based routing enhanced by learnable expert-wise scaling, which adaptively balances the contributions of routed and shared experts. Furthermore, we introduce NormSiLU, an activation function that normalizes inputs prior to SiLU operators, producing a more stable trend of routed-expert activation ratio and a higher intrinsic sparsity level. We also identify an empirical advantage in using non-gated MLP experts with ReLU-based routing, indicating the possibility of MoE architecture simplification. Experiments demonstrate that DECO, activating only 20% of experts, matches dense performance and outperforms established MoE baselines. Our specialized acceleration kernel delivers a 3.00times speedup on real hardware compared with dense inference. Codes and checkpoints will be released.

开源/仓库推理端侧论文/研究

HuggingFace Daily Papers（社区热门论文）

DECO：面向终端设备的稀疏混合专家模型，实现媲美稠密模型的性能

2026-05-11 08:00·36天前

AI 摘要

原文 · 保持原样，未翻译

开源/仓库推理端侧论文/研究

阅读原文arxiv.org