How should we regulate frontier AI models?

 

The next generation of state-of-the-art foundation models could have capabilities that severely threaten public safety and global security. A new multi-author white paper whose co-authors include SRI Director Gillian Hadfield and SRI Graduate Affiliate Noam Kolt discusses the challenges of regulating frontier AI models. Below, three of the paper’s co-authors—Markus Anderljung, Jonas Schuett, and Robert Trager—summarize the paper’s findings.


AI models are already having large social impacts, both positive and negative. These impacts will only grow as models become more capable and more deeply integrated into society.

Governments have their work cut out for them in steering these impacts for the better. They have a number of challenges they need to address, including AI being used for critical decision-making, without assurance that its judgments will be fair and accurate; AI being integrated into safety-critical domains, with accompanying accident risks; and AI being used to produce and spread disinformation.

In a recent white paper, we focus on one such challenge: the increasingly broad and significant capabilities of frontier AI models. We define “frontier AI models” as highly capable foundation models [1] which could have dangerous capabilities that are sufficient to severely threaten public safety and global security. Examples of capabilities that would meet this standard include designing chemical weapons, exploiting vulnerabilities in safety-critical software systems, synthesising persuasive disinformation at scale, or evading human control. 

Read “Frontier AI Regulation: Managing Emerging Risks to Public Safety” on arxiv

We think the next generation of foundation models – in particular, those trained using substantially greater computational resources than any model trained to date – could have these kinds of dangerous capabilities. Although the probability that next-generation models will have these capabilities is ambiguous, we think it is high enough to warrant targeted regulation. The appropriate regulatory regime may even include licensing requirements. 

Effective frontier AI regulation would require that developers put substantial effort into understanding the risks their systems might pose, in particular by evaluating whether they have dangerous capabilities or are insufficiently controllable. These risk assessments would receive thorough external scrutiny and inform decisions about how and whether new models are deployed. After deployment, the extent to which the models are causing harm would need to be continually evaluated. Other requirements, such as high cybersecurity standards, would likely also be appropriate. Overall, regulatory requirements need to evolve over time. 

Based on current trends, creating frontier AI models is likely to cost upwards of hundreds of millions of dollars in compute and also require other scarce resources like relevant talent. The described regulatory approach would therefore likely only target the handful of well-resourced companies developing these models, while posing few or no burdens on other developers.‍

What makes regulating frontier AI challenging?

There are three core challenges for regulating frontier AI models:

  • The Unexpected Capabilities Problem: The capabilities of new AI models are not reliably predictable and are often difficult to fully understand without intensive testing. Researchers have repeatedly observed capabilities emerging or significantly improving suddenly in foundation models. They have also regularly induced or discovered new capabilities through techniques including fine-tuning, tool use, and prompt engineering. This means that dangerous capabilities could arise unpredictably and – absent requirements to do intensive testing and evaluation pre- and post-deployment – could remain undetected and unaddressed until it is too late to avoid severe harm. 

  • The Deployment Safety Problem: AI systems can cause harm even if neither the user nor the developer intends them to, for several reasons. Firstly, it is difficult to precisely specify what we want deep learning-based AI models to do, and to ensure that they behave in line with those specifications. Reliably controlling AI models’ behavior, in other words, remains a largely unsolved technical problem. Secondly, attempts to “bake in” misuse prevention features at the model level, such that the model reliably refuses to obey harmful instructions, have proved circumventable due to methods such as “jailbreaking.” Finally, distinguishing instances of harmful and beneficial use may depend heavily on context that is not visible to the developing company. Overall, this means that even if state-of-the-art deployment safeguards are adopted, robustly safe deployment is difficult to achieve and requires close attention and oversight.

  • The Proliferation Problem: Frontier AI models are more difficult to train than to use. Thus, a much wider array of actors have the resources to misuse frontier AI models than have the resources to create them. Non-proliferation of frontier AI models is therefore essential for safety, but difficult to achieve. As AI models become more useful in strategically important contexts and the costs of producing the most advanced models increase, bad actors may launch increasingly sophisticated attempts to steal them. Further, when models are open-sourced, accessing or introducing dangerous capabilities becomes much easier. While we believe that open-sourcing of non-frontier AI models is currently an important public good, open-sourcing frontier AI models should be approached more cautiously and with greater restraint.‍

What regulatory building blocks are needed for frontier AI regulation?

Self-regulation is unlikely to provide sufficient protection against the risks of frontier AI models: we think government intervention will be needed. The white paper explores the building blocks such regulation would need. These include:

  • Mechanisms to create and update standards for responsible frontier AI development and deployment. These should be developed via multi-stakeholder processes and could include standards relevant to foundation models overall – not just standards that exclusively pertain to frontier AI. These processes should facilitate rapid iteration to keep pace with the technology.

  • Mechanisms to give regulators visibility into frontier AI developments. These mechanisms could include disclosure regimes, monitoring processes, and whistleblower protections. The goal would be to equip regulators with the information they need to identify appropriate regulatory targets and design effective tools for governing frontier AI. The information provided would pertain to qualifying frontier AI development processes, models, and applications.

  • Mechanisms to ensure compliance with safety standards. Self-regulation efforts, such as voluntary certification, may go some way toward ensuring compliance with safety standards by frontier AI model developers. However, this seems likely to ultimately be insufficient without government intervention. Intervention may involve empowering a government authority to translate standards into legally binding rules, identify and sanction non-compliance with rules, or perhaps establish and implement a licensing regime for the deployment and potentially the development of frontier AI models. Designing a well-balanced frontier AI regulation regime is a difficult challenge. Regulators would need to be sensitive to the risks of overregulation and stymieing innovation on the one hand, and the risks of moving too slowly (relative to the pace of AI progress) on the other.

What could safety standards for frontier AI development look like?

The white paper also suggests some preliminary, minimum safety standards for frontier AI development and release: 

  • Conducting thorough risk assessments informed by evaluations of dangerous capabilities and controllability. This would reduce the risk that deployed models possess unknown dangerous capabilities or behave unpredictably and unreliably.

  • Engaging external experts to apply independent scrutiny to models. External scrutiny of the safety and risk profiles of models would both improve assessment rigour and foster accountability to the public.

  • Following shared guidelines for how frontier AI models should be deployed based on their assessed risk. The results from risk assessments should determine whether and how a model is deployed and what safeguards are put in place. Options could range from deploying the model without restriction to not deploying it at all until risks are sufficiently reduced. In many cases, an intermediate option – deployment with appropriate safeguards, such as restrictions on the ability of the model to respond to risky instructions – will be appropriate.

  • Monitoring and responding to new information on model capabilities. The assessed risk of deployed frontier AI models may change over time due to new information and new post-deployment enhancement techniques. If significant information on model capabilities is discovered post-deployment, risk assessments should be repeated and deployment safeguards should be updated.

Other standards would also likely be appropriate. For example, frontier AI developers could be required to uphold high cybersecurity standards to ward off attempts at theft. In addition, these standards should likely change substantially over time as we learn more about the risks from the most capable AI systems and the means of mitigating those risks.‍

Uncertainties and next steps

While we feel confident that there is a need for frontier AI regulation, we are unsure about many aspects of how an appropriate regulatory regime should be designed. Relevant open questions include:

  • How worried should we be about regulatory capture? What can be done to reduce the chance of regulatory capture? For example, how could regulator expertise be bolstered? How much could personnel policies help – such as cool-off periods between working for industry and for a regulatory body?

  • What is the appropriate role of tort liability for harms caused by frontier models? How can it best complement regulation? Are there contexts in which it could serve as a substitute for regulation?

  • How can the regulatory regime be designed to deal with the evolving nature of the industry and the evolving risks? What can be done to ensure that ineffective or inappropriate standards are not locked in?

  • How, in practice, should a regime like this be implemented? For instance, in the US, is there a need for a new regulatory body? If the responsibility should be located within an existing department, which one would be the most natural home? In the EU, can the AI Act be adapted to deal appropriately with frontier AI and not just lower-risk foundation models? How does the proposed regulatory scheme fit into the UK’s Foundation Model Taskforce?

  • Which other safety standards, beyond the non-exhaustive list we suggest, should frontier AI developers follow?

  • Would a licensing regime be both warranted and feasible right now? If a licensing regime is not needed now but may be in the future, what exactly should policymakers do today to prepare and pave the way for future needs?

  • In general, how should we account for uncertainty about the level of risks the next generation of foundation models will pose? It is possible that the next generation of frontier models will be less risky than we fear. However, given the uncertainty and the need to prepare for future risks, we think taking preliminary measures now is the right approach. It is not yet clear what the most useful and cost-effective preliminary measures will be.

We are planning to spend a significant amount of time and effort exploring these questions. We strongly encourage others to do the same. Figuring out the answers to these questions will be extremely difficult, but deserves all of our best efforts.

‍[1] “Foundation models” are models trained on broad data that can be adapted to a wide range of downstream tasks.

Authorship statement

The white paper was written with co-authors from a number of institutions, including authors employed by industry actors that are actively developing state-of-the-art foundation models (Google DeepMind, OpenAI, and Microsoft). Although authors based in labs can often contribute special expertise, their involvement also naturally raises concerns that the content of the white paper will be biased toward the interests of companies. This suspicion is healthy. We hope that readers will be motivated to closely engage with the paper’s arguments, take little for granted, and publicly raise disagreements and alternative ideas.

‍This post originally appeared on the Centre for the Governance of AI’s (GovAI) website. ‍GovAI research blog posts represent the views of their authors, rather than the views of the organisation. The authors of this blog post can be contacted at markus.anderljung@governance.ai, jonas.schuett@governance.ai, and robert.trager@governance.ai.

Want to learn more?


About the authors

Markus Anderljung is head of policy and research fellow at the Centre for the Governance of AI (GovAI).

Jonas Schuett is a law PhD candidate at Goethe University Frankfurt and a research fellow at GovAI.

Robert Trager is the GovAI international governance lead and a professor of political science at the University of California, Los Angeles.


Browse stories by tag:

Related Posts

 
Previous
Previous

Unraveling the reward hypothesis: A peek into human and machine decision-making processes

Next
Next

Exploring user interaction challenges with large language models