GPT-5.5初检发现FrontierMath三成致命错误

Noam Brown@polynoamial

2026-05-12 09:34·35天前

AI 摘要

趣闻：这些致命错误最初是用@OpenAI的GPT-5.5标记的 [引用 @EpochAIResearch]：我们正在对FrontierMath的1-4级进行AI辅助审查。这已标记出约三分之一题目的致命错误，且我们认为大多数标记是有效的。完成人工审核后，我们将在修正数据集上公布更新分数。

Fun fact： the fatal errors were initially flagged using @OpenAI's GPT-5.5

Epoch AIWe are conducting an AI-assisted review of FrontierMath: Tiers 1-4. This has flagged fatal errors in about a third of problems, and we believe most of these fla...

OpenAI论文/研究评测/基准

在 X 查看原推

Noam Brown@polynoamial · X

2026-05-12 09:34·35天前

AI 摘要

Fun fact： the fatal errors were initially flagged using @OpenAI's GPT-5.5

Epoch AIWe are conducting an AI-assisted review of FrontierMath: Tiers 1-4. This has flagged fatal errors in about a third of problems, and we believe most of these fla...

OpenAI论文/研究评测/基准

在 X 查看原推x.com