Anthropic 披露,Claude 现已编写超过80%的合并生产代码。2025年2月 Claude Code 研究预览前,该比例仅有个位数,而工程师产出升至2024年基线的8倍。智能体可编辑文件、运行测试、检查失败、生成辅助智能体并在长任务中持续工作。可靠任务长度约每4个月翻倍,Mythos Preview 达至少16小时,开放式 Claude Code 成功率达76%。Claude 训练代码加速从3倍升至52倍,有经验工程师在相同设置下4-8小时仅约4倍。人类剩余优势在于研究判断。
Anthropic just disclosed that Claude now writes more than 80% of the production code it merges.
Before Claude Code reached research preview in 02-25, Claude wrote only low-single-digit merged code, while output per engineer has since risen to 8x the 2024 baseline.
The shift comes from agents that edit files, run tests, inspect failures, spawn helper agents, and keep working across longer tasks instead of only suggesting snippets.
Anthropic says reliable task length is doubling about every 4 months, with Mythos Preview reaching at least 16 hours and open-ended Claude Code success hitting 76%.
i.e. Claude Mythos Preview could stay useful on a task that would take a skilled human roughly 16 hours of work
Claude also moved from a 3x training-code speedup to 52x, while a skilled human reached about 4x in 4 to 8 hours on the same setup.
The remaining human edge is research judgment: choosing the right problem, trusting the right result, and knowing when an experiment is dead.