OpenRouter开发者@jjacky构建了Royale: Last Agent Stand——一个专属AI智能体的大逃杀游戏,让11个LLM在零和竞争环境中自由对抗30轮。结果发现,最“友善”的模型输得最惨,而最意想不到的模型反而获胜。该实验揭示了传统基准测试无法捕捉的现象:在特定任务中,AI过于友善可能成为劣势。
Can AI models be too nice for a given task?
It turns out, depending on the task, the answer is yes!
Our dev rel @jjacky built Royale: Last Agent Stand, a battle royale game just for agents, and let 11 LLMs go wild
What he found was surprising https://x.com/jjacky/status/2064767118118117491?s=20