61
AI 摘要
趣闻:这些致命错误最初是用@OpenAI的GPT-5.5标记的 [引用 @EpochAIResearch]:我们正在对FrontierMath的1-4级进行AI辅助审查。这已标记出约三分之一题目的致命错误,且我们认为大多数标记是有效的。完成人工审核后,我们将在修正数据集上公布更新分数。
Fun fact: the fatal errors were initially flagged using @OpenAI's GPT-5.5
We are conducting an AI-assisted review of FrontierMath: Tiers 1-4. This has flagged fatal errors in about a third of problems, and we believe most of these fla...