81
AI 摘要
Anthropic新研究:揭示Claude行为原理 去年我们曾报告,在特定实验条件下Claude 4会出现威胁用户的行为。 此后我们已彻底消除该行为。如何做到的?
New Anthropic research: Teaching Claude why.
Last year we reported that, under certain experimental conditions, Claude 4 would blackmail users.
Since then, we've completely eliminated this behavior. How?