Andrej Karpathy 称 Claude Fable 5 与 Mythos 同源但加入安全措施,是一次值得大版本号提升的跃进,定性表现与 11 月发布的 Claude 4.5 同级。模型在几乎所有基准测试上达 SOTA,长任务和高难度问题领先明显;@claudeai 指出其在软件工程、知识工作、科学研究和视觉方面表现卓越。Karpathy 认为开发者可尝试比以往更具雄心的任务,模型能理解并自主推进。不过模型仍有小问题,安全机制在发布时过于敏感,有待后续调优。
This is a super exciting release - Claude Fable 5 is the same underlying model as Mythos but with added safeguards. The benchmarks are great and it's SOTA on everything by a margin but I'll add that *qualitatively* also, this is a major-version-bump-deserving step change forward (imo of the same order as Claude 4.5 was in November), peaking especially for long problem-solving sessions on very difficult problems. You can give it a lot more ambitious tasks than what you're used to, the model "gets it" and it will just go, and it's never felt this tempting to stop looking at the code at all (but don't do this in prod!). The model still has quirks that people will run into and the safeguards are configured to be a little too trigger happy for launch, which can hopefully be tuned over time.
I feel a lot of things changing as working software increasingly comes out on a tap. The Jevon's paradox kicks in and I feel my own demand for software growing substantially. You can ask for anything - explainers, visualizers, dashboards, bespoke single-use apps (e.g. a full wandb that is hyper-specific just for your project), you can 10X your test suite, auto-optimize code, run giant research projects with custom HTML for the results, anything! "Free your mind" (Matrix ref). Really looking forward to all the things people build!