Ethan Mollick测试Fable模型完成柯勒律治未竟诗作《忽必烈汗》,基于PorlockBench任务:假设“波洛克的人”未出现,补全诗歌并延续主题。Fable用时10分钟思考,思维痕迹充满对柯勒律治意图的复杂分析,但结果仍显直白,未达到柯勒律治水准。该评测反映模型在创造性续写任务上的进步,但基准尚未饱和。
Fable's attempt to complete Kublai Khan. Better, though no Coleridge: https://claude.ai/public/artifacts/d7d3351f-5ad5-4d73-a644-4a1426abe558
The most interesting thing is that it thought for 10 minutes &; the thinking trace is full of pretty complicated (seeming?) musings about Coleridge's intent. A little literal, though.