Human-compatible AI: Design principles to prevent war between machines and men

Stuart Russell, Professor of Electrical Engineering and Computer Science at UC-Berkeley, onstage at the AI for Good Global Summit

AI could write more convincing ‘fake news’ than humans, launch more sophisticated cyber attacks, power ‘killer robot’ lethal autonomous weapons, and leave an increasing number of people competing for a decreasing number of jobs.

These possibilities are still relatively remote, leading some to think it premature to pay them serious attention. “This would not be a wise strategy,” said Stuart Russell, Professor of Electrical Engineering and Computer Sciences at UC-Berkeley, speaking at the AI for Good Global Summit in Geneva, 7-9 June 2017.

Stuart Russell has pioneered efforts to understand the potential of AI and the relationship expected to form between AI and humanity over the long term. He is also a leading authority on robotics and bioinformatics.

Russell believes that the actions we take today will have great influence on society’s ability to contend with challenges such as misinformation, malware, autonomous weapons and unemployment. But failing to prepare for the arrival of AI, warns Russell, may see these challenges become insurmountable.

Dealing with challenges before AI amplifies their severity

“Humans are defenseless in information environments that are grossly corrupted,” says Russell. How we might protect the integrity of our information environment is a contemporary problem likely to prove very difficult to solve as AI advances.

“The use of malware has been a catastrophe,” says Russell, highlighting that direct theft from bank accounts now exceeds 100 billion dollars a year. “I think by solving malware we will set up a paradigm for how we can start to think about controlling the misuse of AI.  But we have to do this soon.”

Lethal autonomous weapons are expected to feature among the weapons to be prohibited by the next iteration of the UN Convention on Certain Conventional Weapons. Russell stresses that “we need more public awareness … and our professional societies need to agree that we are not going to do this. Building machines that can decide to kill humans is not okay.”

Leading economists are calling employment the biggest challenge facing the world economy in the next 20 years. The desired shape of future labour markets is a subject for today’s decision-makers, says Russell: “We all talk about having transition plans to help our populations through this big transition, but you can’t have a transition plan unless you have a destination.”


But is AI really smart enough to pose a threat to the future of humanity? 

Moving forward on the assumption that AI is incapable of threatening human life is “not a wise strategy,” repeats Russell.

“We are going to see, for example, technologies that can read; that can read text and understand it, extract information in a useful way. Once that happens, very soon after that machines will have read everything that the human race has ever written.”

Today’s AI poses little threat to the future of humanity, but this threat is certain to increase as AI advances.

“Now you have systems that can look further ahead into the future … they know more about the world than we do and if they can combine those two capabilities [knowledge and foresight], it seems inevitable that AI systems are going to be able to make better decisions than humans across a wide range of activities.”

Provably beneficial ‘human-compatible AI’

“There is a version of AI which I’m calling human-compatible AI which is actually different from the way we have conceived of AI up to now.”

“We have a long history of failing to specify the purpose correctly,” says Russell, making an example of the unintended consequences of legendary King Midas’ wish that everything he touched would turn to gold. “This is a technical problem and this is the main source of risk from AI.”

“You are finding the AI community actually in denial about this question,” says Russell, speaking to the possibility of intelligent machines waging war on humanity. Russell argues that mitigating this risk will demand us to be absolutely certain that we are giving AI the purpose that humanity desires.

How does Russell suggest we go about that?

“Change the very definition of AI and the way we pursue the research in AI, to make it provably beneficial. So this is an oxymoron – ‘beneficial’ is a very vague notion; ‘provably’ is a mathematician’s notion.”

‘Provably beneficial’ human-compatible AI will be built on two key design principles, says Russell:

  • The only objective of AI should be to maximize the realization of human values.
  • And AI cannot be allowed to know what these human values are.

This second principle is most important. “It is precisely the single-minded pursuit of a definite objective that causes the problem,” says Russell. “When the machine is uncertain about the objective but is obliged to help the human achieve it, then it is amenable to correction.” In this way, “the source of information about what humans want will be what humans do.”

But do we really know what we want?

“Humans are incredibly complicated … reverse engineering all of that to figure out what it is we really want on our good days is a really difficult problem. And I think in the process of doing this we will actually learn a little bit more about how we should be. We’ll not just be that way on our best days, but maybe we’ll be that way more often.”

Watch Stuart Russell’s talk from 56:40 to 1:11:12 in the archived webcast of Plenary 3: Future Roadmap – Collaborating for Good.

Share Button