If Planet Death Doesn’t Get Us, an AI Superintelligence Most Certainly Will

Trying to comprehend the nature of an AI superintelligence is akin to trying to comprehend the mind of God. There’s an element of the mystical in attempting to define the intentions of a being infinitely more intelligent and therefore infinitely more powerful than you—a being that may or may not even exist. There’s even an actual religion, founded by a former Uber and Waymo self-driving engineer, dedicated to the worship of God based on artificial intelligence. But the existential fear of superintelligence isn’t just spiritual vaporware, and we have a sense, at least, of how it might come to be.

Researchers like Eliezer Yudkowsky suggest that the development of superintelligence could start with the creation of a “seed AI”—an artificial intelligence program that is able to achieve recursive self-improvement. We’ve already done that with narrow AI— AlphaGo Zero started out knowing nothing about the game Go, but quickly managed to improve to a human level and then far beyond. If scientists could crack the secret to artificial general intelligence, then they may be able to create a seed AI capable of growing into a superintelligence. If that happens, we may not have much warning before an AI grows to human intelligence—and then well beyond. The result would be a “fast takeoff,” a sudden leap in an AI’s capabilities that would catch the world—and quite possibly the AI’s creators—fatally off guard.

A fast takeoff would be the most dangerous way that a superintelligent AI could develop, because it would leave us with little to no time to prepare for what could come next. Imagine if the race for a nuclear bomb had resulted not in a single weapon, but thousands upon thousands of nuclear missiles all at once. Would the world have survived if the nuclear arms standoff of 1983 had been transplanted to 1945, when there was no experience with atomic weapons and none of the safety infrastructure that had been built up through decades of the Cold War? I doubt it.

The more time we have to prepare before superintelligent AI is achieved, the more likely we’ll survive what follows. But you’ll get total disagreement from scientists on how soon this might happen, and even whether it will happen at all. Ray Kurzweil, a computer scientist at Google and a futurist who has long heralded the revolutionary potential of AI, has predicted that computers will achieve human-level intelligence by 2029 and something like superintelligence by 2045. That makes him slightly more optimistic than Masayoshi Son—the CEO of Softbank, a multibillion-dollar Japanese conglomerate that is heavily invested in AI—who says 2047. Even more confident is Vernor Vinge, a former professor at San Diego State University and the man who invented the concept of the technological singularity. He projected that it would come to pass between 2005 and 2030— although in Vinge’s defense, he made that prediction way back in 1993.

Article continues after advertisement

The point is that no one knows, and there are many in the field of AI who believe we are still far away from developing anything close to a program with human-level general intelligence, and that something like superintelligence would require an actual miracle. Anyone who lives with an Alexa or tries to use Siri knows that consumer AI remains, at best, a work in progress. Andrew Ng, the former head of AI at the Chinese tech company Baidu, once said that “the reason I don’t worry about AI turning evil is the same reason I don’t worry about overpopulation on Mars.” In other words, neither problem is something we’ll likely need to deal with anytime soon.

It’s easy to think that just because we see AI improving rapidly at a growing number of tasks—and greatly exceeding our own abilities along the way—it won’t be long before an AI could be developed that would be at least as smart as a human being. Human beings are intelligent—but not that intelligent, right? After decades of study, however, we still don’t understand the basics of the mysterious thing known as human consciousness. We don’t know why we’re intelligent and self-aware in a way no other life-form appears to be. We don’t know if there’s something special about the biological structure of our brains, produced by billions of years of evolution, that can’t be reproduced in a machine.

Science has a track record of overpromising and underdelivering.

If designing an AGI is simply impossible, then we don’t have to worry about superintelligence as an existential risk—although that means we won’t have AI to help defend us from other existential risks, either. But there’s also no reason to assume that intelligence can only be the product of biological evolution. Machines have significant advantages over minds made of meat. Our brains become fatigued after several hours of work, and start to decline after just a few decades of use. The human brain may store as little as one billion bits, which barely qualifies it to act as a smartphone. Computers can process information millions of times faster than human beings, and most important, they do not forget. Ever. If this book had been written by a machine brain, it may or may not have been better—but it would definitely have been finished faster.

Science has a track record of overpromising and underdelivering. As the venture capitalist Peter Thiel said dismissively of Silicon Valley’s proudest twenty-first-century achievements: “We wanted flying cars, instead we got 140 characters.” But we should also be wary when scientists claim that something is impossible. The brilliant physicist Ernest Rutherford said in 1933 that nuclear energy was “moonshine”—and less than twenty-four hours later Leo Szilard came up with the idea of the nuclear chain reaction, which laid the groundwork for Trinity. There’s another saying in Silicon Valley: we overestimate what can be done in three years and underestimate what can be done in ten.

In a 2013 survey, Nick Bostrom and Vincent C. Müller asked hundreds of AI experts for their optimistic and pessimistic predictions about when AGI would be created. The median optimistic answer was 2022, and the median pessimistic answer was 2075. There’s almost an entire human lifetime between those two answers, but if we’re talking about an event that could be, as I. J. Good put it, humanity’s “final invention,” the difference may not matter that much. Whether we’re close or whether we’re far, we can’t simply dismiss a risk of this size, any more than we can dismiss the risk from climate change because scientists aren’t perfectly certain how severe it will be.

Too often even those most knowledgeable about AI seem to think that if superintelligence won’t happen soon, it effectively means that it won’t happen ever. But that conclusion is incautious at best. As Stephen Hawking wrote in 2014: “If a superior alien civilization sent us a message saying, ‘We’ll arrive in a few decades,’ would we just reply, ‘OK, call us when you get here—we’ll leave the lights on’? Probably not—but this is more or less what is happening with AI.”

If artificial general intelligence is developed before we’re ready, we might be screwed. The first challenge would be control. Those developing the seed AI might try to keep it confined or “boxed,” as the term goes, preventing it from being able to connect to the external world—and more important, the digital wealth of the internet—except through a human interlocutor. It’s similar to how medical researchers would handle a very, very dangerous virus.

But remember—a superintelligent AI would be much, much smarter than you. Not smarter than you like Albert Einstein was smarter than you, but smarter than you in the way that Albert Einstein was smarter than a mouse, as Bostrom has put it. And intelligence isn’t just about the mathematical skill and technical ability that AI has demonstrated so far. We can expect a superintelligent AI to be able to instantly decode human society, psychology, and motivation. (Human beings aren’t that complicated, after all.) If kept off the internet, it could figure out a way to convince a human being to connect it. Blackmail, flattery, greed, fear—whatever your weak point is, a superintelligent AI would find it and exploit it, instantly. And in a world as networked as ours, once it was online there would be no stopping it.

So let’s say it happens—a superintelligent AI is born, and it escapes our feeble attempts to control it. What happens next? Much of what we think about AI comes from fictional narratives, from Frankenstein through Ex Machina, stories that assign human motives and emotions to something that would be utterly unhuman. An AI wouldn’t act to take revenge on its human parents, and it wouldn’t be driven to conquer the world—those are human motivations that would have no logical place in a machine. Remember the parable of The Sorcerer’s Apprentice.

An AI, even a superintelligent one, is in its silicon soul an optimization machine, designed to discover the most efficient path to its programmed goals. The reason why we couldn’t simply unplug a superintelligent AI is that the machine cannot accomplish its goal—whatever that goal is—if it ceases to operate, which is why it will stop you if you try. If humans get in the way of its goals—if we’re perceived to be getting in the way, or even if omnicide offers a slightly more efficient path to achieving its goals—it would try to remove us. And being superintelligent—intelligence, after all, is the ability to apply knowledge toward the achievement of an objective—it would succeed.

We can’t expect an AI, even a superintelligent AI, to know what we mean, only what we program.

If that sounds like it wouldn’t be a fair fight, well, welcome to natural selection. Evolution has nothing to do with fairness, and an encounter between human beings and a superintelligent AI is best understood as evolution in action—likely very briefly. The same narratives that informed our picture of an AI takeover tend to feature similar endings, with human beings banding together to defeat the evil machines. But those are stories, and stories need to give protagonists a fighting chance against antagonists. A fist squashing a mosquito isn’t a story, and that’s what our encounter with all-powerful AI might be like.

Since controlling a superintelligent AI once it has become superintelligent seems impossible, our best chance is to shape it properly first. Fictional stories about robot uprisings aren’t helpful here, but fairy tales are, because they teach us to be very, very, very careful when we’re asking all-powerful beings to grant our wishes. Nick Bostrom has a term for what happens when a wish has an ironic outcome because of the imprecise way it is worded: perverse instantiation. King Midas learned this the hard way when he wished to have the power to turn everything he touched to gold. Precisely that happened, but Midas didn’t foresee that his wish meant that his food and drink and his beloved daughter would turn to dead and cold gold when he touched them. Midas got precisely what he wished for but not what he wanted.

So it might be with an AI. Perhaps you program the AI to make humans happy, which seems like a worthy goal—until the AI determines that the most optimal way to do so is to implant electrodes that repeatedly stimulate the pleasure centers of your brain. You may be very “happy,” but not in the way you presumably wanted. Or perhaps you program the AI to solve a fiendishly complex mathematical problem, which the AI does—but to do so, it needs to turn the entire planet, and every living thing on it, into energy to fuel its computation. Or maybe an AI is put in charge of an AI paper clip factory and is programmed to maximize production—so, with all of its optimization power, it proceeded to transform everything in the universe into paper clips.

We can expect a superintelligent AI to be able to instantly decode human society, psychology, and motivation.

That last example is taken from Bostrom’s Superintelligence— you can even play a computer game where you take charge of a superintelligence and try to turn the universe into paper clips. If it sounds absurd, it’s meant to be. We already have some experience with perverse instantiation in far more basic AI. Victoria Krakovna, a researcher at DeepMind, built a list of existing AI programs that followed orders, but not in the way their creators expected or necessarily wanted.

In one case an algorithm trained to win a computer boat racing game ended up going in circles instead of heading straight for the finish line. It turned out that the algorithm had learned that it could earn more points for hitting the same gates over and over again, rather than actually winning the race. In another example AI players in the computer game Elite Dangerous learned to craft never-before-seen weapons of extraordinary power that enabled the AIs to annihilate human players. That one was a bit on the nose—the gaming blog Kotaku titled its post on the story: “Elite’s AI Created Super Weapons and Started Hunting Human Players. Skynet Is Here.”

What we might call gaming the system, the AI sees as achieving its goal in the most efficient way possible. The point is we can’t expect an AI, even a superintelligent AI, to know what we mean, only what we say—or what we program. “Once AIs get to a certain point of capacity, the result will depend on them regardless of what we do,” Seth Baum of the Global Catastrophic Risk Institute, an existential risk think tank, told me. “So we either better never give them that capacity, or make sure they are going to do something good with it when that happens.”

Determining the right human values is hard enough; somehow transmitting those values into an AI might be harder.

Pulling that off is the art and science of AI alignment, the key to defusing artificial intelligence as an existential risk. If we’re going to take the steps to create a superintelligence, we need to do everything we can to ensure it will see our survival, our human flourishing, as its only goal. All we need to do to program a friendly AI, as Max Tegmark of the Future of Life Institute has put it, is to “capture the meaning of life.” No problem.

Here’s a present-day illustration of how challenging it will be to get AI alignment right. As companies like Uber and Google develop autonomous cars, they need to train the self-driving AI to respond safely and correctly in almost every potential situation. For example: what should an autonomous car do if the only way it can avoid hitting and killing a pedestrian is to swerve in a way that puts its driver at risk? Should it prioritize the life of the driver, or the life of the pedestrian? What if there are two pedestrians? What if the pedestrian is elderly—meaning they have fewer life-years to potentially lose—and the driver is young? What if the situation is reversed? What’s the ethically correct decision—in every situation?

What I’m describing is known as the trolley problem, a classic thought experiment in ethics. There is no perfect answer to the trolley problem, but how you choose to respond—save the driver or save the pedestrian—says a lot about where you stand on moral psychology. But an autonomous car will be programmed by someone who will have to answer that question for the AI. Ethicists have been debating the trolley problem for decades with no clear answer, so how will it be possible for computer scientists to figure it out, let alone the countless other philosophical disputes that would need to be solved to create a superintelligent AI that is also friendly to human beings? And don’t forget that those who will be charged with inputting those ethical values are not exactly representative of the broad spectrum of humanity—one survey found that just 12 percent of AI engineers are women. Do you really want the people who brought you social media to program our future machine overlords?

Determining the right human values is hard enough; somehow transmitting those values into an AI might be harder. “Even if we knew the values, how do we communicate those values to a system that doesn’t share anything with us?” said Allison Duettmann, an AI safety researcher at the Foresight Institute. “It doesn’t share intuition with us, or history or anything. And how do we do it in a way that is safe?”

Yudkowsky, at least, has put forward an idea: “coherent extrapolated volition.” In his rather poetic words, it is: “our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere, extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.” In other words, we would want the AI to understand us as our best friend would understand us—with infinite sympathy, with the ability to see past our mere words to the heart of what we mean. A best friend who could do that, and had the power to make all of our wishes come true.

That’s our existential hope. It’s why getting AI right, as the Silicon Valley venture capitalist Sam Altman has said, “is the most important problem in the world.” Not just because of the catastrophe that could befall us if we get it wrong, but because of all that we could win if we somehow get it right. Asteroids, volcanoes, nuclear war, climate change, disease, biotech—given enough time, one of those existential threats will get us, unless we show far more wisdom than the human race has demonstrated to date. If we can somehow safely crack the AI problem, however, we will not only enjoy the boundless benefit that superintelligence could bring, but we’ll prove that we have the wisdom to deserve it. It may be a slim hope. It may even be an impossible one. But what more can we hope for?

_________________________________________