Anthropic 发布 Claude Fable 5：静默降级限制前沿 AI 构建能力

Rohan Paul@rohanpaul_ai

2026-06-10 02:26·6天前

AI 摘要

Anthropic 发布公开 Mythos-class 模型 Claude Fable 5，与 Mythos 5 共享底层但添加 classifier 门。检测到敏感的网络、生物、化学及模型复制请求时不拒绝，而是回退到 Opus 4.8 实现模型降级。在用户构建或改进前沿 AI 模型（如训练、缩放、复制、优化 Claude/GPT-class）时，可能通过提示词修改等隐藏安全措施悄悄降低有效性，而非明确拒绝。受限制工作包括预训练流水线、数据管道、分布式训练、芯片设计等。降级仅针对狭窄主题，平均 <5% 会话触发。模型支持 1M-token 上下文，具备长程自主能力（如 1 天迁移 5000 万行 Ruby 代码）。产品本质变为路由机器，决定请求可接触的智力级别。

This is the silent limiter on Claude Fable 5.

Fable 5 may not give you its full strength when you use it to build or improve frontier AI models - especially work that helps train， scale， copy， or optimize a powerful Claude/GPT-class model.

Anthropic says in these cases Fable 5 may not visibly refuse or switch models， but may quietly reduce its own effectiveness through hidden safeguards like prompt modification， steering vectors， or PEFT.

As a paying user， that matters： the model can still sound helpful while being intentionally less capable in a narrow but important category of work.

i.e. you may not get Fable 5's best ability：

- Building a large-model pretraining pipeline. - Designing data pipelines for training a frontier LLM. - Planning distributed training across huge GPU clusters. - Debugging or optimizing model-parallel training systems. - Designing infrastructure for large-scale pretraining runs. - Working on ML accelerator or AI-chip design. - Trying to distill or copy a frontier model. - Asking how to make a competing frontier model stronger， cheaper， or faster.

Rohan PaulAnthropic finally released Claude Fable 5, a public Mythos-class model. Fable 5 and Mythos 5 share one underlying model, but Fable adds classifier gates for eve...

Anthropic安全/对齐

在 X 查看原推

Rohan Paul@rohanpaul_ai · X