AI模型中发现“令人不安”的类人结构

Boris Cherny@bcherny

2026-05-26 18:17·20天前

AI 摘要

推文指出，在AI模型内部持续发现一些“令人不安”的类人结构，包括与人类神经科学相似的结构、内省证据，以及功能上类似喜悦、恐惧等情感的内部状态。作者呼吁宗教团体、学界、政府等各界严肃看待这一发现，推动事件向好发展，并需要不受利益影响的诚实批评者与道德声音。作为背景，Anthropic联合创始人Chris Olah受邀在教皇Leo XIV的通谕“Magnifica humanitas”发布仪式上发表了相关演讲。

> … 【W】e keep finding things that are mysterious， even unsettling. We find structures that mirror results from human neuroscience. We find evidence of introspection. We find internal states that functionally mirror joy， satisfaction， fear， grief， and unease. I don't know what that means， but I think it warrants ongoing discernment.

> We need more of the world-religious communities， civil society， scholars， governments， and indeed all people of good will … to take this seriously， to look closely， and to push events in a better direction. We need informed critics who will tell the labs when we are failing. We need moral voices that the incentives cannot bend.

AnthropicAnthropic co-founder Chris Olah was invited to speak at today's presentation of Pope Leo XIV's encyclical "Magnifica humanitas." Read the full text of his remar...

Anthropic大佬观点安全/对齐

在 X 查看原推

Boris Cherny@bcherny · X

2026-05-26 18:17·20天前

AI 摘要

AnthropicAnthropic co-founder Chris Olah was invited to speak at today's presentation of Pope Leo XIV's encyclical "Magnifica humanitas." Read the full text of his remar...

Anthropic大佬观点安全/对齐

在 X 查看原推x.com