The Real Alignment Problem

If you’d be willing to die for anything, you should die for this — and you probably will.

[Note: I’ve been working on something longer & more rigorous, but here I offer some brief arguments & perspectives that I hope might provoke those interested in existential risk to elevate nukes to number one on the agenda. Putin’s threat of nukes this week forced me to quickly write something. Please excuse slapdash nature.]

IMAGINE that instead of preventing existential threats you wanted to destroy human civilisation. What’s your best option?

For people involved in EA/longtermist/rationalist circles, it might seem like rogue AGI is most promising.

I’m sceptical.

A vast amount of resources need to be deployed to get from where we are now to that hoped-for goal of killing most humans. You would need an intensification of the already significant funding being directed to AI by large corporations like Google and Facebook. You would need to install in some firms AGI danger officers, who would ensure that safety concerns didn’t come to dominate the corporate culture. You might recognise the limitations of deep learning and need to invent whole new approaches to AI. Perhaps the route to a world-conquering AGI is via robotics, or human-computer interfaces, and these need dedicated research programs with talented people working on the dozens of hopefully surmountable mini-problems along the way. 

This is all very expensive. It might take a lot of time. It might be technically infeasible, even with blank cheques allocated to the problem. Let’s say there are 1000 steps between now and then. (It’s not clear what counts as a “step”. There might be several million steps if each small action is counted. But this is just a round number, a way to index things. I think it’s conservative.) 

Now consider a dark horse candidate, one that has been written off by many as the way civilisation will end.

Nuclear weapons.

I’d like to recommend them as a good off-the-shelf method for destroying humanity, because if you wanted them to be used to create an apocalypse, we at least know precisely how many steps there are between now and then. 

The Cold War superpowers overcame daunting technical barriers. They invented new processes to split atoms, cause chain reactions, contain blasts, send rockets across the globe, as well as building a command and control infrastructure that included the precursor to the internet. Thousands of specialists were employed in dozens of facilities over decades. All of it expensive and requiring successive decades of leadership and commitment from the highest level of government and large corporations. And although this method hasn’t actually been used yet, it is well tested and anchored in physical mechanisms that definitely exist, are well understood, and are currently operational.

If you happen to be Vladimir Putin, you are precisely one step away from achieving your goal with a single order. And even this final step is comprehensible, epistemically uncontroversial, and easiliy within the current bounds of human competence. 

I hope this thought experiment renders vivid the extraordinary status quo bias currently afflicting so many pundits who make predictions about nuclear weapons. We need a hard reset in our expectations and assessments of this risk. I think nuclear war is an order of magnitude more likely than any other catastrophic risk. And I believe it is an under-served area.

I’m not an AI sceptic. I have strong doubts about certain pathways to catastrophe that are premised, I think, on misapprehensions about human intelligence, evolution, and sentience. But I’m certain that human-like AI is possible because humans are a human-like intelligence and are made of nonmagical molecules assembled in a nonmagical process that could be emulated given enough know-how. I also think the dangers of any powerful and untested technology are worth preparing for. So I salute all those working in AI safety. They are doing something almost totally unheard of: preparing for a new kind of catastrophe before it has happened and before it is even feasible. This attitude is vanishingly rare in human history. Adopting it requires serious devotion to reasoning beyond one’s own limited experience and a recognition of various biases and blindspots that make us almost totally unable to envision futures that are anything other than lazy copy/pastes of the past. 

But everything I read about AI risk, about the alignment problem, about scenarios that could cripple our infrastructure, just raises my relative fear of nuclear war.

How should we interpret the status quo?

Most of all, I hope the thought experiment makes clear that it is not hypothetical.

Billions of dollars have actually been devoted to creating an interlocking system designed to destroy the world. That is the actual, deliberate purpose of these weapons. The US government has a plan for how you will die. They are ready to enact it. Tellingly, the plan does not include any information about what happens after an all-out nuclear exchange with Russia. Implicit within the official war plans of the constitutional republic of the United States of America, is that the nation and its citizens will be annihilated. 

Something I ponder is what signals are sent by the relative lack of attention given to the aftermath of a nuclear war. Even during the Cold War, government agencies made only tokenistic attempts to prepare — continuity of government, fallout shelters, communications, etc. — for an all-out attack by the Soviets. (This article gives a flavour, but the whole thing is covered in Garrett Graff’s excellent book Raven Rock.)

One way of interpreting these signals is to assign a low probability to the event. If we conceive of people’s non-preparation for doomsday as a betting market, then they (who have their own lives as skin in the game) clearly don’t think it’s going to happen. 

This still seems to be the general mentality among not only ordinary civilians, but government officials and indeed even among forecasters considering nuclear war. But consider another interpretation of those signals: people are systematically biased to thinking the worst cannot happen. 

People in the 1960s, in the wake of the Cuban missile crisis, surely had cause to take the threat of nuclear war seriously; perhaps more seriously than today. But they didn’t. It is hard to know what additional signals they would have needed to update their model of the world, such that nuclear war was a clear and present danger, one that warranted immediate and significant actions (their personal preparations for themselves and family and political agitation for efforts to lower the risk). I suspect the only signal that would have actually rammed home this threat would have been the use of nuclear weapons by the Soviet Union. 

Ordinarily, that is a reasonable way to respond to feedback from the world in repeat games. Obviously, in the case of a catastrophic but unprecedented risk, however, it is inadequate. People need to use signals from merely hypothetical events to inform their actions. This is hard. It is unnatural. It’s even a bit epistemologically dubious. But for large unprecedented risks it is the only way to proceed. This is why I’m genuinely impressed by the discourse on AI safety which is leveraging science fiction scenarios and other as yet unreal events in order to avoid ruin.

Again, comparing nukes, the case is even stronger because their use is barely hypothetical. Several times in recent history nuclear war almost happened, with that final step almost being taken. (The most famous example is Stanislav Petrov but there ar emany others, catalogued in Eric Schlosser’ book Command and Control.)

Add to this the possibility of technical error — a “normal accident” — causing an inadvertent launch or a false warning and retaliatory launch. Add also the additional concern of how reliance on AI systems might lead to nuclear weapons being used. The most plausible AI doomsday scenarios, to me, are the ones that involve AI using nukes to destroy us. (See this study by RAND). And a point worth remembering is that catastrophic climate change, the rise of fascism, and several other catastrophic risks all entail conflicts which in turn heighten the chance of nukes being used, leading to an all-out nuclear war and subsequent winter.

All this should make us question if the only reason nuclear war is not assigned a higher probability is that it hasn’t happened yet. This is the ultimate case of the proverbial turkey who on the day before Thanks Giving happily concludes from all past experience that they’re always fed and never have their throat cut. 

This is a subtle point but I hope I’m being clear. Normally I’d take other people’s — especially experts’ — sanguine attitudes as information supporting the low probability of nuclear war. But the fact that they don’t even seem to act according to their own assessments, makes me question their overall informational value. I am led to believe, instead, that they are in denial and are systematically (for very human reasons) underestimating the odds of nukes being used. At all times during the Cold War and since, almost everyone has acted as though they think there’s no chance of nuclear weapons being used, even when by their own assessments there has been, say, a 5% or 10% chance or higher. 

Alignment

We worry that an AGI may not be aligned with our own interests. But the incumbent situation is that we have leaders of nations with authority to launch, whose interests are pointedly not our own. Putin’s interest, for example, is… what? Dominance, maintaining power, expanding Russia — whatever it is, it is not my interest, nor those of most humans. And if he’s willing to gamble the possibility of nuclear war on whatever his interest is, then he is pathologically unaligned with my interests. 

Daniel Ellsberg’s Doomsday Machine is a generally terrifying work exposing the insanities of official US nuclear war plans. He also provdies his account of the Cuban missile crisis. He notes that Castro was pushing hard for a superpower conflict, apparently happy to sacrifice his own nation provided it meant an ideological black eye and millions of casualties for America.

And why should this lack of alignment with anything approaching normal human decency or pacifism or liberal values surprise us? It’s part of the normal span of human behaviour to sometimes spitefully sacrifice oneself to punish an enemy. And whether this trait is particularly rare among leaders of nations, I leave as an exercise for the reader.

Ellsberg also notes that, at one point in the crisis, JFK assessed the odds of a nuclear war with the USSR at 10%. But because of domestic political reasons and the exigencies of reelection, he judged that a fine gamble. I think JFK had one of the more humane approaches to foreign policy of any recent US president, but I consider the wagering of millions of his own citizens’ (or anyone’s) lives on a political campaign to be psychopathic.

And for more banal reasons were millions of lives written off. For at least a couple of decades, the SIOP provided for a retaliatory strike on the USSR and China regardless of whether or not China was involved in the conflict. This was merely because it was simpler to have one plan. In the case of Soviet aggression, Chinese cities would have been nuked and perhaps a hundred million humans murdered. This was an official American plan to achieve what would have been the largest and most flippant genocide in history. 

And we have military contractors lobbying representatives to warp and misalign the interests of senators and congresspeople against those of their electors. Lockheed Martin, Boeing, Northrup Grumman, Raytheon — I name them here because I think what they are doing is prima facie evil. The divestment campaigns that have helped stigmatise fossil fuel companies should be tripled to raise awareness of these Judas companies who, for those in the back, make money by keeping the threat of nuclear war elevated.

(It’s also worth asking why other corporations aren’t lobbying for nuclear disarmament, even though their share price will surely suffer as their customers and stockholders are vaporised in a nuclear war. The interests of capital, save for the specific corporations involved in the nuclear arsenal, should be aligned with disarmament. That they’re not at all, suggests denial.)

If any of this outrages people, I again point out that it is mainly the status quo and virtually no one protests it. Most people in the world currently have their lives in the hands of men they did not and could not vote for. Few register even a verbal complaint. A tiny fraction occasionally take to the street. 

But I don’t think this should be taken as a strong signal of the low probability of nukes being launched. People signal larger distress over comparatively lower impact, lower likelihood events. Our nuclear quiescence must be taken as a hint at a fundamental misunderstanding of the risk landscape. It justifies a fresh assessment.

US–Russian roulette

Here’s another thought experiment that I call US-Russian roulette.

We currently walk around with the possibility (even if you think it’s <1%) of nuclear war happening. It is completely out of our hands and could happen at any moment and yet it would kill us and our loved ones. It is as though we have loaded guns rigged up to our heads that are remotely activated, according to the decisions of people few of us elected or by a malfunction. How many of us would allow this to continue with the only justification offered to us that because we’re all fitted with the guns and they go off in unison, no one would be insane enough to activate them?

Relatedly, what if nuclear weapons had not been invented yet, but you found out that the US and Russia were planning on building them and amassing world-ending stockpiles? The leaders calmly explain that the very fact of mutually assured destruction will actually be a point in favour of the proposal. Honestly, try to imagine what probability you would assign to nuclear war if it was a new thing? 

You might attach higher probability to Putin using these new weapons, because the actual manufacture and deployment of these weapons would be interpreted as unambiguous signals of intent to use.

But of course, this is the incumbent situation: Putin has weapons, has maintained them, has threatened their use, and has even partially mobilised them. The fog of MAD and deterrence theory has turned everything we would normally interpret as signals of intent, perversely, into signals of non-intent or at least neutral signals. 

Nukes run on physics, not beliefs

Nuclear weapons are belief-proof. They really work according to uncontroversial physics and hard-nosed engineering. This is unlike the method used to prevent their use, i,e., the doctrine of mutually assured destruction and the larger body of work known as deterrence theory. These require a whole host of beliefs about other people’s beliefs to be true, in order to work.

The supreme assumption underwriting the whole enterprise is that the people making decisions over whether or not to use nukes, are governed by the kind of rational self-interest that the deterrence theorists have ascribed to them. 

To put it mildly, this is a low-confidence domain of knowledge. And yet one thing we know for certain is that leaders of nations sometimes act in ways that are judged irrational by outsiders and indeed in ways that were not predicted beforehand. Hitler. Kim Jong-un. Stalin. Castro. Mugabe. Assad. Putin. I could name many more. It is striking, therefore, that our lives are held in thrall to a theory of human behaviour that is, if nothing else, definitely wrong in many cases.

If I was pressed to give a prediction of my own, I still wouldn’t. 

I’m no superforecarser, but I’m a fan of Tetlock’s work, I’m familiar with the basic methods, and I’ve even done the calibration training. I can put a number on my prediction and have one in mind. It’s higher than most other people’s. I see Tegmark has also just given higher odds than other forecasters. I realise this is the best way he knows to convince people to take it more seriously.

But I don’t want to incrementally move people’s credences. There needs to be a figure/ground shift in taking nukes way more seriously than any other existential threat.

Rodriguez, in her report, aggregates existing forecasts and comes up with a fairly low number.  A recent update based on Samotzvety predictions is slightly higher but the forecasters’ notes are revealing: smart people trying to grapple with a difficult domain. I think it is actually an impossible one. Not only do they lack some of the crucial information, they don’t even know what that information would be because there is no well defined theory for predicting individual human actions in a complicated scenario. Even retrospectively, we often don’t know how to explain people’s decisions in historical events that have actually happened. We have no ability to predict them in a future that hasn’t yet occurred.

Nukes are set up in an intricate system that took years and billions to build, thousands of people to maintain, and which represents an immense store of potential energy. All it will take is a very small shove to get the ball rolling down the hill to a more stable basin in the energy landscape. Until we dismantle this system, the only thing preventing the sudden increase in global entropy caused by a nuclear holocasut, are some airy things called beliefs.

The strongest arguments against nuclear war happening are also decidedly weak. 

  1. It hasn’t happened yet. 
  2. MAD ensures it won’t happen.
  3. Even in a crisis, cooler heads prevail. 
  4. There is a nuclear taboo that applies even to Putin, Kim, Trump, et al. 

Hasn’t happened yet. This is a foolish inference from past experience that obviously doesn’t hold up. 

MAD. The more sophisticated version of “it hasn’t happened yet” posits a causal relationship between the advent of nukes and their non-use and even their deterrent value for preventing conventional wars. I do take this seriously. If the default is enemies using their weapons to destroy one another then it is notable that it never happened in the Cold War and the reason might be MAD. But again, it might not. MAD requires all the players to be following the same strategies. And MAD does nothing for preventing accidental launches.

Cooler heads prevail. In the few crises we have records of, slightly “cooler” heads prevailed. But in all cases that I know about there were hotheads pushing for nuclear war. If Curtis Lemay had launch authority, if the political officer on the sub with Petrov had authority, etc., then a hothead would have prevailed. It seems that we have been very lucky that there hasn’t yet been a hothead in the key decision-making role of any crisis. Some interpret this to mean that we must be robustly if muysteriously insulated from hotheads, otherwise it likely would have happened. If I was playing poker and had gotten five nice cards on the river in five successive hands, I would not conclude that good cards must always come up. I would stop gambling. Because my life is in the hands of these decision-makers, I am not content to observe a weak trend and conclude that things generally work out. I demand a much more risk-averse epistemology.

Nuclear taboo. Our safety is reliant on nebulous norms that have apparently held this far. Another historical correlation. But granting it an actual causal role in the non-use of nukes, it’s true that a lot of human history involves the adherence to norms and conventions. And much of history is the creation of new norms; but that same history is also the breaking of old norms or the dissolution into smoke of what you thought was a norm. Taboos are unstable.

There is something to be said for deterrence theory. It has deterred us from investigating the gravest threat to civilisation, deterred us from acknowledging the cracks in the dam, deterred us from sneaking a look at our executioner’s face.

Once again, the mismatch between the gravity of this risk and the lack of solutions and effort put into reducing it is stark. I see another signal of status quo bias and collective denial. 

Very briefly, what to do

Nuclear disarmament needs to happen. I don’t necessarily advocate complete abolition, because, as many have pointed out, that probably leaves the world open to defectors and subsequent arms races. A stable level of disarmament might see six states with 15 nukes each, all deployed in submarines. Even this might be unstable in the longterm, but the crucial point would be that the total number of nukes in operation is less than the amount needed to cause a significant nuclear winter event. That should be the key number. 

Getting to disarmament is hard. I don’t know how to achieve it and I’ve been reading about it for years. There are proposals, roadmaps to disarmament, in the literature, written by analysts with much more knowledge than me of all the history, diplomacy, and international relations stuff. They strike me as possible but unlikely and often based on the same notions of taboo and belief that underpin deterrence theory. Regardless, way more effort/resources should be put into them. 

But it may be that the way to achieve disarmament is something that hasn’t even been thought of yet. Or something that depends on new technologies or new institutional norms. Or it might involve some unpredictable Hail Mary play. 

Using history as a guide, it seems to me that history is no guide to what the solution will be, but does suggest there will be unimagined solutions. It’s always worth remembering that the nearest we came to serious disarmament was the Rejkjavik summit of 1986. According to Richard Rhodes’ reconstruction of events, Gorbachev and Reagan were on the brink of agreeing to it, except that Reagan couldn’t bear to totally junk Star Wars. What got the two leaders to this stage just two years after the most hawkish Cold War rhetoric and rearmaments? Human actions are almost impossible to cleanly attribute to causes. But it looks as though the leaders’ minds were changed by Gorbachev personally witnessing the desolation of Chernobyl and by Reagan watching the TV movie The Day After

If true, who would have advocated as the best method of achieving nuclear disarmament, a reactor meltdown in Ukraine or a compelling movie starring Jason Robards, to personally and emotionaly influence the specific men in charge of the relevant nations?

I’m no dewy-eyed optimist. Maybe, for game theoretic reasons, it is simply impossible to ever escape the threat of world-ending technologies as long as there are defectors who see they can achieve their strategic aims by posessing them. If so, the space colonisation enthusiasts may be right, that the only way to spread civilisational risk is to go interplanetary. And if we don’t have time to make a copy of humanity on another planet, we may be able to at least create something of a technological and civilisation backup, so that, following an inevitable catastrophe, the reboot can at least be made more possible. 

I’m no defeatist either. I find these plans to be sane but maddening. We have to inject our ingenuity into preventing apocalypse at least as much as insuring against it  — partly because our insurance policies aren’t ready yet and also I don’t want to die because of some creep in the Kremlin or a flock of geese in Canada.

New ideas are urgently needed in this space from people smarter than me and less dead than Thomas Schelling. 

I implore longtermists to reassess their priorities and dedicate efforts to mitigating the threat of nuclear war. Imagine nukes were invented today and an arms race ensued: where would you rank them on existential risks? Imagine you wanted to destory the world: which is by far the easiest, ready-to-go option?

Our interests are unacceptably, outrageously unaligned with those who control and threaten the use of nuclear weapons. That is the real alignment problem that exists right now.