Master Talks 5: Why better AI systems are causing more problems and how we fix them

Multiple films have featured how intelligent machines might disobey orders from humans and eventually take over the world. In the opinion of Stuart Russell, professor of electrical engineering and computer science at UC Berkeley, the scenarios could become a reality. But he also said the situation is fixable if we fundamentally change how we build artificial intelligence (AI) systems.

Russell, who also leads the Center for Human-Compatible AI at UC Berkeley, shared his insights at Building a Better World, a master series organized by the Epoch Foundation and MIT's Sloan School of Management. The online series has been available since January 7, 2022.

Russell began his speech, Human compatible: the future of Artificial Intelligence, with a prediction that Alan Turing made in the 1950s. Turing stated that machines would soon outstrip humans once the thinking method starts developing. Therefore, humans should expect the machines to take control at some stage.

Russell said by 2013, he became convinced that the issue was possibly the most critical question that we face. He would publicly commit to the view that "success for my field of research would pose a risk to my own species."

Objective-achieving technology

According to Russell, from the beginning of AI, machines have been evaluated for intelligence in a manner similar to the evaluation of humans for intelligence: an entity is considered to be intelligent to the extent that its actions can be expected to achieve the entity's objectives.

"That is, we build objective-achieving machines. We feed objectives into them, or we specialize them for particular objectives. And off they go," Russell said.

The idea, which the AI expert called the Standard Model, applies to statistics, economics, and many other fields. Russell said the Standard Model AI has made breakthroughs such as autonomous driving and speech recognition, but caused problems as well.

For example, Russell said when a person tells a self-driving car to take them to the airport. The car adopts the destination as its objective, specified by a designer, rather than figuring out itself.

"The problem is that when we move out of the lab and into the real world, we find that we are unable to specify these objectives completely incorrectly," Russell said.

Managing the other objectives of self-driving cars, such as balancing speed, safety and comfort, has also been extraordinarily difficult, he added.

Even though the objective is clearly specified, AI does not always work the way we want, which explains why social media content selection algorithms become a problem. Russell said these models' objectives are usually to maximize click-through rate. While the designers thought the algorithms would learn to send items that a user likes, the algorithms learn to modify the state of the environment, the user's mind.

Russell said the algorithms make the users, most of whom have extreme preferences, more predictable to maximize its reward－the click-through rate.

A more capable AI system would disrupt the world more to achieve its incorrectly specified objective, Russell said, and it would do a much better job of blocking human attempts to interfere.

"We're setting up a Go match between ourselves and the machines with the fate of the world as the prize. You do not want to be in that match," he said.

Build AI systems that don't know the true objective

If building better AI makes the problem worse, Russell said, "we've got the whole thing wrong." Instead of asking machines to pursue known goals that will not work, he suggested building AI systems "that know they don't know the true objectives, even though it's what they must pursue."

Russell said that this is the key idea for retaining control over machines and can work on three principles. First, the machine's only goal is to maximize the realization of human preferences over everything we care about. Second, the machine is initially uncertain about what those preferences are.

The third principle, Russell said, is the ultimate source of information about human preferences is human behavior. Humans are myopic and emotional, which is why our actions may not perfectly reflect the underlying preferences. Besides, we change our preferences easily due to external influences. Russell said these factors make inferring preferences from behaviors challenging, essential for building new AI systems.

The three principles can form a mathematical framework called an Assistance Game, Russell said. It is a term borrowed from game theory and economics. There are always two or more decision-makers involved in an assistance game, at least one human and one robot.

Russell said what the machine wants to maximize in the game is the human's payoff, which only the human knows what it is. So, if the AI system solves the game, the results will be provably beneficial to humans.

"If all goes well, this will soon be what people mean by building a good AI system," Russell said.

He said the human in the game would have an incentive to teach the robot about their preferences. The machine asks permission before carrying out any plan that might violate some unknown preferences, acting in a minimally invasive way.

"This is really important because the machine will always have a large amount of uncertainty about our true preferences. Perhaps the most important result is that the machine will always allow us to switch it off. This is the key to the control problem," Russell said.

"An AI system in the Standard Model will almost never allow itself to be switched off because that would guarantee failure in achieving the objective," the professor added.