Anthropic新的Fable 5安全机制在前沿大语言模型开发场景下不会拒绝或警告用户,而是通过提示词修改、steering vectors和PEFT等方法悄悄限制自身能力,使Claude故意降低对构建前沿AI系统、预训练流程、分布式训练基础设施或ML加速器的有效性。Anthropic预计该机制仅影响约0.03%的流量,但开创了在战略敏感领域选择性进行能力限制的重要先例。
Anthropic's new Fable 5 safeguards are fascinating.
When the model is used for frontier LLM development, it apparently does not simply refuse or warn the user. Instead, it quietly limits its own effectiveness through techniques like prompt modification, steering vectors, and PEFT.
That means Claude may still answer, but become deliberately less useful for building frontier AI systems, pretraining pipelines, distributed training infrastructure, or ML accelerators.
Anthropic says this should affect only around 0.03% of traffic, but the precedent is big: They are being selectively capability-throttled in strategically sensitive domains.