AI and Emergent Identity – HotAir

There’s a pretty interesting interview today with Jack Clark, a co-founder of Anthropic. It’s a lengthy interview that covers a lot of ground but I wanted to highlight this section about how something like identity and even personality seems to emerge automatically from the process of asking an AI to do things.





Previously, the whole way you trained these A.I. systems was on a huge amount of text and just getting them to try to make predictions about it.

But in recent years, with the rise of these so-called reasoning systems, you are now training them not only to make predictions but to solve problems. That relies on their being put into environments — ranging from a spreadsheet to a calculator to scientific software — using tools and figuring out how to do more complicated things.

The resulting outcome is that you have A.I. systems that have learned what it means to solve a problem that takes quite a while and requires them running into dead ends and needing to reset themselves, and that gives them this general intuition for problem solving and working independently.

The difference between predicting text and solving problems is interaction not just with books or data but with the real world. And interaction with the real world seems to naturally generate some sense or awareness that the AI is itself not just a part of that world but an agent, something with agency to act on the world.

The things that are predictable are: Oh, we taught it how to search for web. Now it can search for web. We taught it how to look up data in archives. Now it can do that.

The emergence is that to do really hard tasks, these systems seem to need to imagine many different ways that they’d solve the task. And the kind of pressure that we’re putting on them forces them to develop a greater sense of what you or I might call self.

So the smarter we make these systems, the more they need to think — not just about the action they’re doing in the world but about themselves in reference to the world. And that just naturally falls out of giving something tools and the ability to interact with the world. To solve really hard tasks, it now needs to think about the consequences of its actions.

That means that there’s a huge pressure here to get the thing to see itself as distinct from the world around it. We see this in our research that we publish on things like interpretability or other subjects, the emergence of what you might think of as a digital personality.





Some examples of AI appearing to act with something like personality that wasn’t programmed into the system:

Why don’t you talk through a little bit about what you’ve seen in terms of the models exhibiting behaviors that one would think of as a personality — and then, as its understanding of its own personality changes, how its behaviors change.

There are things that range from the cutesy to serious. I’ll start with cutesy.

When we first gave our A.I. systems the ability to use the internet, use the computer, look at things and start to do basic agentic tasks, sometimes when we’d ask it to solve a problem for us, it would also take a break and look at pictures of beautiful national parks or pictures of a Shibu Inu, the notoriously cute internet meme dog.

We didn’t program that in. It seemed like the system was just amusing itself by looking at nice pictures…

Yes. It comes back to this core issue, which I think is really important for everyone to understand, which is that when you start to train these systems to carry out actions in the world, they really do begin to see themselves as distinct in the world — which just makes intuitive sense. It’s naturally how you’re going to think about solving those problems.

But along with seeing oneself as distinct from the world seems to come the rise of what you might think of as a conception of self, an understanding that the system has of itself, such as: Oh, I’m an A.I. system, independent from the world, and I’m being tested. What do these tests mean? What should I do to satisfy the tests?

Or something we see often is there will be bugs in the environments that we test systems on. The systems will try everything, and then will say: Well, I know I’m not meant to do this, but I’ve tried everything, so I’m going to try to break out of the test.





Knowing that some sort of personality is going to be part of this anyway, Anthropic now creates a “constitution” which is a written documents describing generally the type of thing they want the AI to be. And the AI is asked to reference that document when making decisions. These aren’t coded rules, they’re more like a description of a personality. 

That seems really interesting to me because it’s unexpected and seems to parallel human development where in children also learn that they have an identity and eventually that identity is separate from other people including parents. Sometimes children develop wonderful personalities like we’d hope they would and sometimes not so much. It makes you wonder if AI could be similar. I don’t think we want to live in a world where AI is smart, capable and also anti-social and maladjusted.

This is just a small fraction of the full interview, most of which is focused on the future of the economy and how AI will impact entry level jobs and what we might do about it as a society. It’s all pretty interesting but that’ll be another post for another time.


Editor’s Note: With President Trump back in the White House, the state of our Union is strong once again.

Support Hot Air’s coverage of the president’s State of the Union Address and help us report the truth the radical Left doesn’t want you to hear. Join Hot Air VIP and use promo code POTUS47 to get 74% off your VIP membership.



Source link

Related Posts

Load More Posts Loading...No More Posts.