The terminology of AI regulation: Ensuring “safety” and building “trust”
Are advanced artificial intelligence (AI) systems a threat to our sense of safety and security? Can we trust AI systems to perform increasingly critical roles in society?
Policymakers, technologists, and members of the larger AI ecosystem are currently grappling with these questions. This article is the second of two in which we align key terminologies toward a precise and useful understanding across diverse contexts as a crucial step toward effective policymaking. The first article explored AI “risk” and “harm,” while this article explores the meanings of the terms “safety” and “trust” in the AI context, while also considering the significance of these terms for building effective AI regulation.
Safety standards: Technical, psychosocial, individual
Safety is a key consideration almost everywhere: Whether it’s seatbelts in cars or parental controls on children’s digital devices, safety features are ubiquitous. In our context here, we define safety as the reduction or eradication of harm, following Heidy Khlaaf’s 2023 publication on risk assessment and assurance of AI. Technical standards, such as those promulgated by the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC), are one medium through which society promotes and ensures safety in various areas. Historically, these safety standards are built and developed through a series of testing processes, technical committees, and risk management and oversight mechanisms. For example, aircraft are subject to a comprehensive set of regulations, which are mandated to ensure the airworthiness of an aircraft (i.e. proper product design, manufacturing, and maintenance), the competence of the flight crew, and general operating and flight rules. In other words, the safety of air travel has been extensively tested and standardized to meet and uphold industry and societal expectations of safety.
However, we also want to make a distinction between technical standards of safety and individual expectations and experiences of safety. For example, airports introduced full-body scanners in the 2000s to provide an even more extensive level of safety and security. However, concerns quickly arose around this intrusive means of searching passengers, which compromised their personal privacy at a very intimate level. This is compounded by the use of even more intrusive security methods which are disproportionately employed against members of equity-seeking groups. These examples demonstrate how individual experiences of safety can differ from the overarching technical goal of keeping commercial flying “safe.” In this sense, it’s important to recognize that technical and psychosocial safety are distinctly different concepts. In building notions of safety for AI systems, then, we must remain conscious of both sides of safety: technical standards and social/individual experiences.
Individual experiences of safety can differ from the overarching technical goal of keeping something “safe.” It’s important to recognize that technical and psychosocial safety are distinctly different concepts.
What is AI safety?
“AI safety” refers broadly to the field of research dedicated to ensuring that AI systems behave safely and reliably, with minimal risk of unintended consequences or harm to humanity. However, the term “AI safety” itself has not been clearly defined in a consistent manner across literature and regulation. With the advent of generative AI in the public domain, AI safety sits at the fulcrum of recent political governance initiatives that have embraced the term as an umbrella under which to mitigate AI's harms. Some notable examples of this include the November 2023 UK AI Safety summit and the recent announcement of an AI Safety Institute funded by the Canadian government.
AI safety has emerged as the first truly interdisciplinary, cross-industry, and government framework we have seen since the launch of major generative AI models into the world. It includes issues of machine ethics and alignment—the latter refers to ensuring that AI systems are designed in a way that encodes human values—but also hails the related technical aspects of safety and security. The October 2023 executive order issued by U.S. President Biden notably adopts the term “AI safety” to guide the development and implementation of standards protecting against a wide range of AI harms, from intellectual property violations to threats of a chemical, biological, radiological, and nuclear nature.
In this sense, AI safety is a key element of standards-setting and oversight that bridges the gap between technical and policy efforts to govern AI. Specifically, AI safety finds regulatory meaning in national and international standards for the development of AI models and AI system testing. SRI researchers have collaborated with the Standards Council of Canada regarding their work in this area. These tools often include demonstrable, quantifiable metrics, which perhaps points to the origin of AI safety in technical disciplines.
Trust: A function of relationships, needs, values, and stakes
In contrast to safety, the word “trust” speaks to a framework more conditioned by social relations. Although there is no universal definition of trust, it can broadly be understood, as John D. Lee and Katrina A. See wrote in 2004, as “the attitude that an agent will help achieve an individual’s goals in a situation characterized by uncertainty and vulnerability.” Trust is a subjective concept; people evaluate trust differently based on the relationships, needs, values, and stakes that are involved. However, there are some generally agreed upon principles that constitute our shared understanding of trust. For instance, the reliability of an individual or agent to deliver on its assigned goals can influence trust, as well as the associated accountability mechanisms. So too can the explainability or transparency of an individual’s or agent’s performance of said goals.
To put it simply: while the word “trust” may have somewhat varying meanings for each person or scenario, it does consistently exist in connection to some omnipresent factors: e.g. reliability, accountability, and transparency.
Perhaps counterintuitively, trust facilitates risk-taking because demonstrated reliability in performance and behaviour are preconditions of trust. For example, customers trust restaurants to prepare food safely and properly, while restaurants trust customers to pay for their meals. If a customer gets food poisoning, their trust in the restaurant's safety and quality diminishes. In everyday instances, we can point to what it takes to trust something or someone, and concurrently, what it means for that trust to be lost.
Perhaps counterintuitively, trust facilitates risk-taking because demonstrated reliability in performance and behaviour are preconditions of trust.
Trust is also highly contextual. Even to the extent to which trust is defined by general principles, the specific content of trust varies based on the situation. For example, the trust we place in an AI system making executive decisions in a medical case would be influenced by a different set of considerations compared to an AI music selection bot. Notably, trust in the context of medical care would be highly influenced by legal and regulatory requirements imposed on the practice of medicine, like liability for injury, in a way that they would never be considered in the realm of music selections. To this end, law and regulation can serve as important contexts that complement trust in a relationship. The highly contextual nature of AI trust, as opposed to AI safety, calls for a considered engagement of AI regulation as well as knowledge mobilization across government, industry, and the public to build systems of trust that will support effective AI adoption. As these technologies do not rest within national borders, the need for international discussion of AI standards and their applications is clear.
Can we trust AI?
In the context of AI, trust plays a significantly different role compared to other technologies because of the ability of AI tools to perform or simulate autonomous interaction. Consider generative AI tools that create content from a prompt, like ChatGPT or Midjourney. They are built to identify patterns in training data and use that information to generate new samples that resemble the original data. Specifically, the predictive modelling that is core to generative AI produces a diversity of outputs—text, image, video, code, etc.—that are customized in response to user inputs and prompts. In a procedural sense, we might say that about internet search—a protocol and convention to which we have been long habituated. The big shift, however, is what the Turing test designated as the “imitation game.” Instead of the ranked findings that we would find in a Google search or the explicitly peer-reviewed entries for Wikipedia, generative AI chatbots serve a compellingly articulate natural language analysis/aggregation of the topic at hand, which resembles a human response more so than any other previous publicly available technology did.
In this sense, our interactions with generative AI systems are relational: we provide a prompt that they aggregate and prioritize in a summary of the most probable answer. Additionally, generative AI systems offer an apparently authoritative view in seamless syntax or photoreal imaging, which further supports a relational user experience. Due to the fact that our interactions with AI systems are similar to our interactions with other humans—at least more so than any prior technology has been—the process by which we assume the accuracy of these systems’ outputs can perhaps be described as trust, and not simply technical or statistical reliability. In other words, the phenomena of AI “mirroring” human-like behaviour and the human tendency to anthropomorphize objects in the world all lend themselves to trusting AI. Building AI trust, then, describes a process by which we, as humans, interact with the technology to establish a relationship that is not predicated on risk mitigation or “safety.”
our interactions with AI systems are similar to our interactions with other humans, so the process by which we assume the accuracy of these systems’ outputs can perhaps be described as trust, and not simply technical or statistical reliability.
How can policymakers approach safety and trust?
A 2024 report published by the Conference Board of Canada in partnership with MaRS found that generative AI could significantly increase productivity and improve the country's overall GDP. However, in order to unlock AI’s transformative potential across our economies and societies, issues of safety and trust must be addressed. Despite this, current data points to mixed signals about public sentiments regarding AI across multiple tools and use cases in various countries around the world.
The historical policy approaches to improving technological safety are well established: develop standards, build oversight bodies and infrastructure, and conduct ongoing and transparent testing, quality assurance, and other measures. Nonetheless, we also see the limits of policy and protocols when we look at the track record of engineering disasters. AI systems raise serious concerns about safety, as they are a new and unpredictable technology out in the world. However, even if AI safety mechanisms can be reliably established, safety alone won’t be enough to justify the integration of AI across many aspects of society. In order to build trust, governmental and industry entities must understand how different diverse sets of users—geographically, demographically, etc.—understand and communicate the terms of trust. So how can policymakers address trust?
Policymakers should integrate this thinking on trust into emerging AI governance. Given that trust is often subjective and difficult to measure, it is imperative that researchers from various fields collaborate to build a common vocabulary of trust in machine learning and AI systems, specifically formulating the principles that will define trustworthy human-AI relations. Here at SRI, Research Lead Beth Coleman is currently facilitating an interdisciplinary working group of researchers to explore the role of trust in human-machine learning interaction. Integrating a diversity of views and learnings around trust will help frame strong and reputable tools for building AI trust.
Experts in the AI ecosystem describe a dilemma regarding AI regulation: one the one hand, there’s a need for policy action to prevent harms, mitigate risks, and promote safety in AI. On the other hand, societies seek to avoid stifling innovation and adoption. Trust-building regulation can help resolve this dilemma by mitigating negative impacts of AI through the integration of human values into the assessment and adoption of AI systems. Improving AI safety is one facet of encouraging trust-building in this technology. By recognizing that our interactions with AIs are relational, building trust addresses harms and risks of AI systems, but it also has the potential to meaningfully integrate human values and principles directly into those systems. AI may be poised to transform our society, but it will only do so if safety and trust are prioritized from the start.
Want to learn more?
About the authors
David Baldridge is a policy researcher at the Schwartz Reisman Institute for Technology and Society. A recent graduate of the JD program at the University of Toronto’s Faculty of Law, he has previously worked for the Canadian Civil Liberties Association and the David Asper Centre for Constitutional Rights. His interests include the constitutional dimensions of surveillance and AI regulation, as well as the political economy of privacy and information governance.
Beth Coleman is a research lead at the Schwartz Reisman Institute for Technology and Society and an associate professor at the Institute of Communication, Culture, Information and Technology and the Faculty of Information at the University of Toronto. She is also a senior visiting researcher with Google Brain and Responsible AI as well as a 2021 Google Artists + Machine Intelligence (AMI) awardee. Working in the disciplines of science and technology studies, generative aesthetics, and Black poesis, Coleman’s research focuses on smart technology and machine learning, urban data, civic engagement, and generative arts. She is the author of Hello Avatar and a founding member of the Trusted Data Sharing group, and her research affiliations have included the Berkman Klein Center for Internet & Society, Harvard University; Microsoft Research New England; Data & Society Institute, New York; and the European Commission Digital Futures. She served as the Founding Director of the U of T Black Research Network, recently released Reality Was Whatever Happened: Octavia Butler AI and Other Possible Worlds (K Verlag, Berlin), and is currently overseeing SRI’s working group on trust in human-ML interactions.
Alicia Demanuele is a policy researcher at the Schwartz Reisman Institute for Technology and Society. Following her BA in political science and criminology at the University of Toronto, she completed a Master of Public Policy in Digital Society at McMaster University. Demanuele brings experience from the Enterprise Machine Intelligence and Learning Initiative, Innovate Cities, and the Centre for Digital Rights where her work spanned topics like digital agriculture, data governance, privacy, interoperability, and regulatory capture. Her current research interests revolve around AI-powered mis/disinformation, internet governance, consumer protection and competition policy.