60
AI 摘要
Microsoft MAI 技术报告公开模型细节:1T 总参数,35B 活跃参数,在 33.5T tokens 上训练。最突出的特点是零合成数据、零知识蒸馏,推理、智能体行为、工具使用全部在后训练中从头学习。报告透明度极高,首次在此规模公开各迭代的 MFU 和完整缩放方案,目标成为前沿实验室。
Fantastic in depth guide about Microsoft MAI by @eliebakouch
tl;dr about the model: Respect where respect is due.
-zero synthetic data or distillation from previous models. -1T model with 35B active, trained on 33.5T tokens
microsoft MAI tech report is a gold mine, one of the most transparent for a model at this scale. this model uses zero synthetic data or distillation from previo...