Beyond Human-in-the-Loop: Managing AI Risks in Nuclear Command-and-Control

By admin On Dec 6, 2024

On Nov. 16, U.S. and Chinese leaders met on the margins of the Asia-Pacific Economic Cooperation summit in Lima, Peru, jointly affirming “the need to maintain human control over the decision to use nuclear weapons.” This declaration echoes a joint document submitted by France, the United Kingdom, and the United States during the Nuclear Nonproliferation Treaty review process in 2022.

With countries increasingly prioritizing military applications of AI, integrating AI into nuclear weapons systems is becoming a distinct possibility, especially as nuclear arsenals undergo modernization. While some nuclear-weapon states have emphasized the importance of maintaining human oversight and control over decisions to employ nuclear weapons, it is too early to take a victory lap. Avoiding a “Skynet” scenario, where AI takes independent control of nuclear weapons, does little to reduce the real risks of unintended nuclear launches.

AI holds the promise of enhancing the performance and capabilities of nuclear command, control, and communications systems, which form the backbone of nuclear decision-making. However, if integrated with haste and without adequate risk assessment, safeguards, and redundancies, such integration could dramatically heighten the risk of unintended nuclear escalation. Escalation risks can arise from altered decision-making dynamics, accelerated processing speeds that outpace human supervision, or insidious errors that can propagate undetected through complex systems — regardless of whether humans remain in the decision-making loop.

To prevent nuclear calamity and ensure the responsible use of AI in nuclear command-and-control, states should move beyond mere prescriptive commitments to human oversight. Reducing the risk of unintended nuclear escalation requires a governance framework that establishes a quantitative threshold for the maximum acceptable probability of an accidental nuclear launch as a uniform safety benchmark. Valuable governance lessons can be drawn from civil nuclear safety regulation, in particular what regulators refer to as the “risk-informed” and “performance-based” safety governance approach. Applying these principles to nuclear command-and-control systems requires moving beyond the simplistic human-in-the-loop prescription to focus on assessing the system’s safety performance. The objective is to assess the quantitative likelihood of an accidental nuclear launch with a particular configuration of AI and non-AI subsystems and to ensure that that likelihood remains securely below an acceptable threshold.

AI’s Impact on Nuclear Risks

Assessing how AI can impact the nuclear domain and contribute to unintended escalation is no easy task. The current limited understanding of the behavior of AI models, their rapid and unpredictable advancement, and the complexity and opacity of nuclear systems and subsystems that feed into the decision-making process make this discussion largely speculative. Despite this, it is still possible to foresee how states might consider implementing AI as part of broader efforts to modernize aging nuclear arsenals based on existing nuclear postures and states’ desire to gain a strategic advantage.

For instance, Gen. Anthony J. Cotton, commander of U.S. Strategic Command, has pointed to AI’s potential to automate data collection, streamline processing, and accelerate data sharing with allies. Similarly, official statements and documents from other nuclear powers often frame AI as a tool to assist human decision-makers to make faster and more informed decisions, beyond the nuclear domain.

In principle, AI’s ability to analyze vast amounts of data from diverse sources is well-suited to identify threats quickly, analyze sensor data, automate the identification of objects, and evaluate potential courses of action. However, AI introduces a number of significant risks due to the inherent limitations of today’s advanced AI models.

First, AI is unreliable. Today’s AI can confidently generate false information that can lead to flawed predictions and recommendations, ultimately skewing decision-making. This phenomenon is termed “hallucinations.” Examples include a large language model generating incorrect facts about historical events, or a vision model “seeing” objects that are not there. Second, the opacity of AI systems — known as the “black box” problem — makes it difficult to fully understand how an AI system reaches its conclusions. This lack of transparency undermines trust and reduces the utility of AI in high-stakes environments like nuclear decision-making, where transparency is crucial. Third, AI systems are susceptible to cyberattacks, creating opportunities for adversaries to compromise the integrity of nuclear command-and-control systems. Finally, current AI models struggle to align outputs with human goals and values, potentially deviating from strategic objectives. The high-pressure environment of nuclear decision-making, combined with limited response time, exacerbates these dangers, as decisions may rely on inaccurate, opaque, compromised, or misaligned information.

Despite the declarations of some nuclear-armed states to maintain human control in nuclear decision-making, not all of them have explicitly committed to this, leaving room for grave consequences due to misunderstandings or misinterpretations of countries’ intent. But even if all nuclear states made similar declarations, there is no simple way to verify these commitments. Moreover, human–machine interaction itself can introduce severe risks. Operators may place excessive trust in an AI system, relying on its outputs without sufficient scrutiny, or they may distrust it entirely, hesitating to act when speed is critical. Both situations can skew decision-making processes even when AI systems function as intended. All of these limitations persist even when states maintain human oversight.

Further compounding these risks is the uncertainty surrounding AI’s future advancements. While current limitations may eventually be resolved, new risks could also emerge that remain unpredictable at this stage.

The Precedent of Civil Nuclear Safety Regulation

While the risks of AI-integrated command-and-control may seem novel, the management of nuclear risks with severe consequences for public health and safety is not a new challenge for governments. Indeed, the principles of risk-informed, performance-based, and technology-neutral regulation — drawn from the governance of civil nuclear safety — may usefully apply to the nexus of AI and nuclear command-and-control.

In the United States, the process of “risk-informing” the regulation of nuclear safety began with the 1975 Reactor Safety Study. This quantified the risks of accidents and radioactive releases associated with nuclear power generation using probabilistic-risk-assessment techniques such as event trees and fault trees. Simply put, these techniques map out the various sequences of cascading events, including system failures, that could ultimately lead to an accident, allowing the probabilities of various consequences to be quantified.

Prior to the quantification of risks, regulations were based primarily on prescriptive and deterministic requirements. For instance, regulators prescribed multiple redundant safety features to prevent certain foreseen accidents without explicitly considering the likelihood of any given accident sequence. After the 1979 Three Mile Island accident, the Nuclear Regulatory Commission expanded its research into the more extensive application of probabilistic-risk-assessment techniques. This was recommended by investigations after the accident, culminating in a 1995 policy statement and subsequent plans to “risk-inform” the commission’s safety regulation.

Meanwhile, industry pushed for the more extensive use of performance-based regulation giving the licensee greater flexibility in determining how to accomplish a defined safety goal. Rather than specifying what safety features must be included in the reactor design, a performance-based regulatory requirement would simply establish a quantifiable safety outcome. In its public communication, the Nuclear Regulatory Commission illustrates its performance-based approach using a skydiving example. In this case, the regulator would institute a “performance requirement” that “the parachute must open above an altitude of 5,000 feet” without specifying whether that outcome should be ensured with a rip-cord or an automatic activation device.

Guided by the qualitative safety goal that nuclear power plant operation should not contribute significantly to individual and societal risks, by 1986 the Nuclear Regulatory Commission had defined a measurable benchmark that “the overall mean frequency of a large release of radioactive materials to the environment from a reactor accident should be less than 1 in 1,000,000 per year of reactor operation.” That benchmark has since been refined into more operationalizable standards.

In recent years, as diverse and novel reactor concepts emerged, it became clear that many safety features prescribed for traditional reactors were no longer applicable. Regulators have therefore prioritized the development of technology-neutral regulations allowing greater flexibility in how reactor designs could satisfy safety performance benchmarks. In this context, the probabilistic-risk-assessment techniques and performance-based regulatory approach developed over the decades have proven critical for ensuring the adaptation of safety governance to technological advancement.

Applying Lessons from Civil Nuclear Safety to Nuclear Command, Control, and Communications

As Gen. Cotton admitted: “[W]e need to direct research efforts to understand the risks of cascading effects of AI models, emergent and unexpected behaviors, and indirect integration of AI into nuclear decision-making processes.” Indeed, the rapid evolution of AI is outpacing research efforts, leaving significant gaps in our understanding of how AI-integrated functions supporting nuclear decision-making might inadvertently lead to escalation.

Ongoing multilateral discussions on responsible AI integration in the military domain have yet to define what constitutes “safe” AI integration in nuclear command-and-control and adjacent systems, particularly given the high-stakes consequences that even a single error could trigger. To complicate matters further, nuclear-armed states are likely to integrate AI in different ways, driven by their unique doctrines, capabilities, and threat perceptions. For instance, states that perceive themselves as being at a strategic disadvantage may be willing to accept higher risks associated with AI integration if it offers strategic advantages such as faster decision-making and strategic parity.

Establishing quantifiable risk thresholds for AI integration is therefore essential. Risk assessment frameworks can help policymakers distinguish between high-risk and acceptable AI applications. To ensure that the risks of inadvertent escalation do not exceed established thresholds, they would analyze how specific AI models interact with nuclear command-and-control and adjacent systems and identify cascading failure points as well as their potential consequences.

This is where civil nuclear safety regulation can provide useful lessons. The management of AI risks in nuclear command-and-control should integrate probabilistic-risk-assessment techniques and adopt performance-based rather than prescriptive benchmarks, where performance refers to the reliability of AI systems, their ability to generate accurate and well-aligned outputs as well as the effectiveness of the system’s safety guardrails. Probabilistic-risk-assessment techniques are necessary because black-box systems are inherently resistant to deterministic fault analysis, and complex accident sequences require systematic risk quantification.

Additionally, technology-neutral safety governance demands a risk-informed, performance-based approach. While probabilistic-risk-assessment must take technology into account, bilateral and multilateral safety commitments must be applicable to a variety of technologies given the divergent ways in which states are likely to integrate AI into their command-and-control systems. Moreover, the rapid advancement of AI systems will give rise to novel failure modes to which prescriptive guardrails cannot always catch up. As such, rather than rigid prescriptions, which would in any case be exceedingly difficult to verify with an intangible technology like AI, it is far more practical for countries to agree on a set of broad safety goals. Countries could commit, for instance, to the overarching qualitative safety goal that AI systems integrated into nuclear command-and-control should not increase the risk of nuclear weapon use and on that basis develop measurable safety objectives — such as keeping the risk of an accidental nuclear launch under 1 in 10,000,000 per year. Probabilistic-risk-assessment techniques could then be used to evaluate whether a particular configuration of AI (or non-AI) systems will meet these objectives.

As an example, it is possible to plot an event tree to assess the probability that the hallucination of threat data as an initiating event may lead to accidental escalation. One branch of the tree may represent the probability of redundant systems correcting the data automatically. Another may represent the probability of human operators double-checking the source data from the early warning system. Yet another, associated with the probability that redundancy and human oversight both fail, may represent the threat data being transmitted onward and strike recommendations being formulated on that basis. The risk of an accidental escalation to nuclear war given this particular initiating event amounts to the probability of all guardrails failing — a risk that can be assessed quantitatively. If the risk across all initiating events exceeds the defined quantitative threshold, then the configuration of systems must adjust to either eliminate certain high-risk integrations or increase the effectiveness of guardrails.

When it comes to AI systems, risk assessment of this kind would consider both technological risks and integration risks. Technological risks have to do with a model’s reliability, transparency, and performance. Integration risks, on the other hand, focus on how and where AI is used — ranging from low-stakes tasks like improving communication efficiency to high-stakes functions such as formulating strike recommendations. The design and built-in redundancies of the system are also crucial factors within the assessment.

Prescriptive commitments — such as to the principle of human-in-the-loop or the exclusion of certain types of frontier AI systems — may seem categorical, but they are neither technology-neutral nor guaranteed to lower the risk of accidental nuclear use below a quantifiable threshold. Indeed, they create a false sense of security and foster the illusion that nuclear weapon possessors can meet their risk reduction obligations by following a set of prescriptions that do not evolve with time and are not guaranteed to keep accident risks below a defined order of magnitude.

To be sure, objective performance criteria cannot always be defined, which is why civilian nuclear safety regulators have retained some prescriptive requirements. Probabilistic-risk-assessment techniques also have their limitations, particularly in assessing risk contributions from human, organizational, and safety culture factors. Therefore, even as the U.S. Nuclear Regulatory Commission works to risk-inform its safety regulation, it has maintained its commitment to the principle of defense-in-depth, which refers to the practice of layering redundant safety systems that are highly unlikely to fail simultaneously. The same principle should be applied in the context of AI and nuclear command-and-control, but in a way that takes risk insights into account. Lessons from early civil nuclear safety regulation that relied exclusively on redundant safety systems showed that the defense-in-depth approach alone was suboptimal.

Ultimately, the responsibility to prevent accidental or inadvertent escalation rests with nuclear weapon possessors, regardless of whether their command-and-control systems rely on floppy disks or frontier AI. The safety outcome is what matters, and the approach to AI-nuclear risk reduction must align with its fundamentally performance-based logic.

Recommendations

Moving forward, the United States and China should build on their prescriptive commitment to human control and agree on a set of quantifiable safety objectives under the qualitative safety goal that the use of AI should not contribute to an increase in the risk of nuclear war, regardless of whether humans are in the proverbial loop. They should also take the lead in researching probabilistic-risk-assessment techniques that may be used to quantify the accident frequencies of AI-integrated nuclear command-and-control systems. That may include an endeavor to understand the failure modes of various AI systems and develop the appropriate AI safety performance evaluation frameworks. Perhaps most importantly, research should identify the limitations of the various techniques for evaluating AI risks when it comes to nuclear command-and-control applications. The diplomatic process involving the five permanent members of the U.N. Security Council and the Nuclear Nonproliferation Treaty review process can offer opportunities to bring other nuclear- and non-nuclear-weapon states to the table once Washington and Beijing have reached preliminary agreement. The “Responsible AI in the Military” domain summits may serve to involve more diverse stakeholders in the risk management discussion.

In the end, countries may well discover that it is infeasible to confidently quantify the risks of AI-integrated nuclear command-and-control, or that even the lowest reasonably achievable probability of an accident — whether 1 in 1,000,000 or 1 in 10,000,000 — still presents an unacceptably high risk to humanity. If so, the effort to quantify AI risks in nuclear command-and-control systems and to limit those risks below a quantitative threshold will have been worthwhile nevertheless, as it will have revealed the inadequacy of mere prescriptive commitments such as human control. Unless each nuclear weapon state can configure its command-and-control system to ensure that the likelihood of an inadvertent nuclear launch remains below a quantitative threshold, commitments to keep humans in the loop will provide little more than an illusion of safety.

Alice Saltini is a non-resident expert on artificial intelligence with the James Martin Center for Nonproliferation Studies. Alice’s research centers on the intersection of AI and nuclear weapons. Recognized as a leading voice in this field, she advises governments on AI’s implications within the nuclear domain. She has published extensive research at the AI–nuclear intersection and presented it to various governments and international organizations. Alice has also developed a general-purpose risk assessment framework for analyzing these risks.

Yanliang Pan is a research associate at the James Martin Center for Nonproliferation Studies, where he conducts research and facilitates Track 2 engagement initiatives focused on AI and nuclear energy. His commentary on nuclear energy issues has appeared in the Bulletin of the Atomic Scientists and the Electricity Journal, as well as the websites of the Carnegie Endowment for International Peace, World Politics Review, and Georgetown University’s Institute for the Study of Diplomacy.

Image: United States Space Force via Wikimedia Commons

Commentary