63
AI 摘要
Soohak 一个由数学家策划的基准测试,用于评估LLMs的研究级数学能力
Soohak
A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs
Soohak 一个由数学家策划的基准测试,用于评估LLMs的研究级数学能力
Soohak
A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs
Soohak 一个由数学家策划的基准测试,用于评估LLMs的研究级数学能力
Soohak
A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs