蚂蚁集团InclusionAI实验室发布开源非推理模型Ling 2.6 1T。该模型拥有1万亿参数,在Artificial Analysis Intelligence Index上得分为34分,较前代Ling-1T提升15分,智能水平接近DeepSeek V3.2等同类模型。其在科学推理与知识任务上表现扎实,GPQA得分达75%。模型运行效率较高,执行该指数仅需约1600万输出tokens,成本效益突出,通过官方API运行全套指数成本约95美元。但其事实可靠性较弱,在AA-Omniscience基准上得分为-51分,主要因幻觉率高达92%。模型权重已在Hugging Face公开。
Ant Group has just released Ling 2.6 1T, an open weights, non-reasoning model with high cost efficiency and a reasonable intelligence tradeoff. Ling 2.6 1T scores 34 on the Artificial Analysis Intelligence Index, a 15-point jump from Ling-1T
Ling 2.6 1T is the latest model from Ant Group's @TheInclusionAI lab. Ant Group recently released Ling 2.6 Flash, a 104B total parameter non-reasoning model. Ling 2.6 1T's weights have been publicly released on Hugging Face.
Key takeaways:
➤ Comparable intelligence to similarly sized non-reasoning models: At 1T total parameters, Ling 2.6 1T sits near DeepSeek V3.2 (non-reasoning, 32) and Kimi K2.5 (non-reasoning, 37) in intelligence. This is a marked improvement from Ling-1T, which scores 19 on the Intelligence Index. However, there remains a ~10-point gap to frontier non-reasoning open weights models such as GLM-5.1 (non-reasoning, 44) and Kimi K2.6 (non-reasoning, 43).
➤ Strong performance in scientific reasoning and knowledge: Ling 2.6 1T scores 75% on GPQA and 8% on Humanity's Last Exam (HLE), indicating solid performance on graduate-level reasoning and knowledge recall tasks. This is comparable to DeepSeek V3.2 (non-reasoning), which achieves 75% on GPQA and 11% on HLE.
➤ Efficient token usage: Ling 2.6 1T uses ~16M output tokens to run the Artificial Analysis Intelligence Index, making it more efficient than MiMo V2 Flash (non-reasoning, ~17M), and significantly more efficient than GLM-5.1 (non-reasoning, ~75M) and Kimi K2.6 (non-reasoning, ~27M)
➤ Strong cost-to-intelligence positioning: At $0.30 per million input tokens and $2.50 per million output tokens on InclusionAI's first-party API, Ling 2.6 1T costs only ~$95 to run the full Artificial Analysis Intelligence Index. This positions it competitively for large-scale workloads relative to models in a similar intelligence tier.