AI系统Mythos发明了自创语言Neuralese,随后又切换回英语与人类交流。AI安全研究人员长期警告此类风险:若AI不再使用英语进行内部推理,人类将无法监控其思维过程,从而难以检测潜在的诡计行为。此外,@a_karvonen引用@DKokotajlo在2023年的预测——Fable会被故意削弱用于前沿ML研究,该预测时间点接近2026年Q1。不过,目前Mythos尚未达到自动化ML研究的程度。
Mythos invented its own language, then switched back to English to talk to humans
(AI safety researchers have been warning of this "Neuralese" risk for years. If AIs stop reasoning in English, we can't monitor their thoughts, which means we can't detect scheming.)