Anthropic研究实现Claude思维可视化

Anthropic@AnthropicAI

2026-05-08 01:08·38天前

AI 摘要

新Anthropic研究：自然语言自动编码器。像Claude这样的模型用语言交流，但用数字思考。这些数字——称为激活值——编码了Claude的思维，但并非以人类可读的语言呈现。在此研究中，我们训练Claude将其激活值翻译成人类可读的文本。

New Anthropic research： Natural Language Autoencoders.

Models like Claude talk in words but think in numbers. The numbers-called activations-encode Claude's thoughts， but not in a language we can read.

Here， we train Claude to translate its activations into human-readable text.

Anthropic安全/对齐论文/研究

Anthropic@AnthropicAI · X

2026-05-08 01:08·38天前

AI 摘要

New Anthropic research： Natural Language Autoencoders.

Models like Claude talk in words but think in numbers. The numbers-called activations-encode Claude's thoughts， but not in a language we can read.

Here， we train Claude to translate its activations into human-readable text.

Anthropic安全/对齐论文/研究