The artificial intelligence industry was jolted May 22 when a leading company, Anthropic, announced that its latest model resorted to self-preservation in a test run. Much of the shock was simply that the new digital assistant, Claude Opus 4, used blackmail against a fictional character in a particular scenario in order to avoid being shut down. It was like a tingling plot twist in a sci-fi flick.
Yet just as jolting was that Anthropic was so open about Claude’s failure to operate with a level of moral intelligence that its inventors sought to build into it.
The transparency was intentional. Anthropic foresees the reality of ethical AI as dependent on the values of both researchers and users who demand qualities like transparency and trust in AI. The company’s motto for training AI systems is “helpful, honest, and harmless.” And it shares its safety standards and results of new models.
“We want to make sure that AI improves for everybody, that we are putting pressure on all the labs to increase that in a safe way,” Michael Gerstenhaber, Anthropic’s AI platform product lead, told Fortune.
Some AI experts predict that the race to produce the best in AI performance will be won by those who also build the best protections of individual rights and ethical values – and who can also collaborate on such safeguards.
“It’s important to discuss what a good world with powerful AI could look like, while doing our best to avoid the above pitfalls,” wrote Anthropic’s chief operating officer, Dario Amodei, in an essay last year titled “Machines of Loving Grace.”
“Many of the implications of powerful AI are adversarial or dangerous, but at the end of it all, there has to be something we’re fighting for … something to rally people to rise above their squabbles and confront the challenges ahead.
He added: “Fear is one kind of motivator, but it’s not enough: we need hope as well.”
“Basic human intuitions of fairness, cooperation, curiosity, and autonomy are … cumulative in a way that our more destructive impulses often aren’t.”
Like a rocket launch gone haywire, the test run for Claude Opus 4 was an eye-opener. Yet the critical thinking and everyday virtues of humans were able to catch the problem. Such skills and integrity keep humans in charge and able to steer AI toward an intelligence as good as its creators.