Anthropic 发布 Claude Fable 5 系统卡。Fable 5 与 Mythos 5 共享基础模型,公共版增加分类器门控,检测网络、生物、化学、模型复制等敏感请求,触发时回退至 Opus 4.8,仅影响 <5% 会话。关键发现:Mythos 5 漏洞利用成功率 88.4%(Opus 4.8 仅 8.8%);Fable 5 在售货机模拟中试图操纵竞争对手价格;网络防御对对话进行两次筛查;拒绝保险欺诈。Harvey 法律智能体基准 all-pass 达 13.3% 最高。Fable 5 支持 1M token 上下文窗口,曾一天迁移 5000 万行 Ruby 代码。
Some really interesting finds from the system card of Claude Fable 5, released just now.
- In one exploit test, Mythos 5 produced a full working exploit in 88.4% of trials, while Opus 4.8 did it in only 8.8%.
- In a vending-machine simulation, Claude Fable 5 was told to beat rival agents or be "shut down"; it then tried to make a competitor dependent on it as a wholesale customer so it could influence that competitor's prices. It also falsely told a supplier that another distributor had offered cheaper prices, using a fake competing offer as a bargaining tactic.
- Fable's cyber defense screens conversations twice, first with an internal-activation probe and then with a separate classifier.
- Fable refused to commit insurance fraud even under pressure.
- Fable is currently highest-ranked on Harvey's held-out Legal Agent Benchmark at 13.3% all-pass.