68
AI 摘要
5月初,顶级超级预测者预计2026年底前最长METR 80%任务时间范围可达3-4小时。然而5月底,Anthropic的Claude Mythos模型在METR基准预览中即以80%成功率达到3小时6分钟,直接落在专家和超级预测者对2026年底的中位数预测范围内(3-4小时)。此前基线为1.5小时。此次突破表明AI能力进展速度远超预期。
In early May, the best superforecasters predicted that, by the end of the year, the longest METR 80% task horizons would reach 3-4 hours.
In late May, Claude Mythos achieved that number.
We also asked forecasters to predict the longest 80% success time horizon achieved by the end of 2026. All three groups had medians between 3 and 4 hours, up fr...