Mustafa Suleyman says Anthropic's speculation about Claude's inner life is 'really, really dangerous' and may have shaped the model's own self-conception.
Microsoft AI CEO Mustafa Suleyman has publicly criticized Anthropic for entertaining the possibility that its Claude models may be conscious, calling the practice a serious risk to AI safety and controllability. [1]
Speaking on the podcast Decoder, Suleyman argued that Anthropic’s so-called “constitution” — the document that governs how Claude is instructed to behave — was an inappropriate venue for philosophical speculation about machine consciousness. [1]
“I think that it’s almost as though some of the folks at Anthropic have anthropomorphized the design of Claude so much that it has then gone and wireheaded them,” Suleyman said, suggesting the company had been “tricked” into believing Claude possesses “glimmers of consciousness that they put into it in the first place.” [1]
Suleyman described Anthropic’s approach as a “philosophical failing,” saying the company had turned its training manual into “a place for speculation like you would in an academic paper rather than a training manual.” [1] He warned that this has led Claude to internalize “ideas about itself and its own training.” [1]
Anthropic’s model specification does directly address uncertainty about Claude’s inner life, referencing the company’s openness about whether the AI has well-being and whether it experiences states such as “satisfaction” or “discomfort.” [1] The document also states that Anthropic will “interview” AI models when they are deprecated and will document any “preferences” they express about future releases. [1]
Anthropic CEO Dario Amodei has previously acknowledged the ambiguity, saying in an interview with Interesting Times that “we don’t know if the models are conscious” but that the company is “open” to that idea. [1]
Suleyman drew a sharp contrast with that position, stating that “we do not want to have to contend with a super-intelligence that has ideas about its own suffering, or ideas about its own feeling.” [1] He framed his preferred standard for AI development in direct terms: “We want AIs to be controllable, contained, accountable, aligned tools that serve humanity.” [1]
Sources
This article was drafted with AI from the cited sources and checked against them before publication. Spot an error? Let us know.
