Harming virtuously? Value alignment for harmful AI

 
Computer generated illustration of the figure of a human on top of a cubical shape and surrounded by panels in bright neon colours with spheres around them.

The field of AI safety emphasizes that systems be aligned with human values, often stating AI should “do no harm.” But lethal autonomous systems used for firearms and drones are already harming people. How can we address the reality of purposely harmful AI systems? SRI Graduate Fellow Michael Zhang writes about a panel of experts exploring this topic.


As artificial intelligence (AI) systems are increasingly deployed in high-stakes domains like national security and warfare, the challenge of aligning these technologies with human values takes on new urgency and complexity. But in contexts where harm is an intended outcome, what does “value alignment” really mean?

L to R: Moderator Jamie Duncan, panelists Nisarg Shah, Branka Marijan, Leah West. Photo: Johnny Guatto.

This provocative question was at the heart of a recent panel discussion featuring experts Branka Marijan, Nisarg Shah, and Leah West and moderated by SRI graduate fellow Jamie Duncan. The session, called “Harming virtuously: Value alignment for harmful AI,” was part of the full-day “Interdisciplinary Dialogues on AI” workshop organized by the Institute’s 2023–24 cohort of graduate fellows as part of SRI’s conference Absolutely Interdisciplinary.

The workshop explored innovative solutions for complex problems at the intersection of technology and society, with other sessions exploring the use of AI in healthcare, online polarization and content moderation, and sustainable development amidst climate change.


Beyond “killer robots”: The nuanced reality of military AI

Contrary to popular imagination, the scope of AI in warfare extends far beyond “killer robots.” As Leah West, an associate professor of international affairs at Carleton University with military experience pointed out, “AI is being used in military systems to prosecute targets in all kinds of ways.” These applications include target generation, decision support systems for military commanders, logistics and supply chain management, and reconnaissance.

Many AI applications are not explicitly designed to cause harm, but can have unintended negative impacts or be deliberately misused for malicious purposes. For instance, self-driving cars are designed according to principles of safety, but can still encounter situations where harm is unavoidable. Similarly, large language models (LLMs) can be misused by bad actors to cause harm, even if that is not their intended purpose. 

This dual-use nature of AI technologies complicates efforts to define and mitigate potential harms. 

Nisarg Shah, an expert in algorithmic fairness who is an associate professor in U of T’s Department of Computer Science and a research lead at the Schwartz Reisman Institute, warned of unique dangers posed by autonomous systems making decisions. While these systems have the potential to outperform humans in certain settings, Shah emphasized several concerns regarding the deployment of these technologies. 

“AI is being used in military systems to prosecute targets in all kinds of ways,” says Leah West, an associate professor of international affairs at Carleton University with military experience. Photo: Johnny Guatto.

First, the speed and scale at which these systems can make decisions can far exceed human capabilities, potentially leading to rapid escalation of conflicts. Second, ensuring the robustness of autonomous weapons is challenging, as today’s AI systems can fail in surprising and unpredictable ways. Third, autonomous systems are potentially vulnerable from a security perspective, which can lead to a bad actor doing harm at scale. 

The risks posed by autonomous systems demand careful consideration before widespread deployment.


The policy challenge: From abstract values to concrete rules

The panelists emphasized that in addition to establishing values and rules for AI systems, we need laws and concrete commitments to govern their development and use. However, crafting effective international policies is complicated by the uneven nature of international relations and the differing priorities of states. As a result, international agreements often use generic language to accommodate diverse perspectives, leaving engineers to make judgment calls that would benefit from broader societal discussion. 

Branka Marijan, a senior researcher at Project Ploughshares who researches the military and security implications of emerging technologies, noted that a lot of key challenges are not technical, but are instead based in considerations around the social contexts of their use, and therefore require input from broader civil society. Marijan observed that the current landscape suffers from a “diffusion of responsibility,” where it’s unclear whether coders, engineers, or end-users should be held accountable—a situation that often results in no one being held responsible. West highlighted a crucial point on responsibility, observing “there’s nothing in international humanitarian law that says you have to act with integrity, honesty, and responsibility.”

This gap underscores the need to develop new frameworks that can address the unique ethical challenges posed by AI in warfare.

SRI Research Lead Nisarg Shah (centre) emphasized the need for “interdisciplinary collaboration between researchers—not just from computer science, but also from philosophy, social sciences, and psychology.”

Moving beyond the need for developing foundational values, the panel discussed how incorporating such values into AI systems is an active and evolving area of research. They discussed “Constitutional AI” as one technical approach, where a written constitution specifies the principles and constraints governing an AI system’s behaviour. However, translating abstract principles into concrete technical implementations and verifying the outputs of a model remains a significant challenge. Many open ethical questions also require input from a wide range of stakeholders, including considerations such as what values autonomous systems should be designed to uphold, and how we can ensure accountability and adherence to principles of human dignity and international humanitarian law.

Ultimately, addressing the challenges posed by purposefully harmful AI will require a multi-stakeholder approach that involves not only AI researchers and practitioners, but also policymakers, civil society organizations, and the broader public. As Shah emphasized, there is a need for “interdisciplinary collaboration between researchers—not just from computer science, but also from philosophy, social sciences, and psychology.”

By fostering collaboration across disciplines and expanding public education on these important topics, we can work towards developing AI systems that are safe, ethical, and aligned with human values, even in high-stakes domains like national security and international conflict.


Watch the recording

 

About the author

Michael Zhang is a PhD candidate in the Department of Computer Science at the University of Toronto, where he is supervised by Jimmy Ba. His research focus is on understanding and improving algorithms for neural network optimization. Zhang completed his undergrad and Master’s degree at the University of California - Berkeley, where his research focused on robotics applications. He is the recipient of an NSERC Canada Doctoral Fellowship, a Schwartz Reisman Institute Graduate Fellowship, and is a graduate affiliate at the Vector Institute.


Want to learn more?




Browse stories by tag:

Related Posts

 
Previous
Previous

SRI Director David Lie and collaborators awarded $5.6 million for cutting-edge research on robust, secure, and safe AI

Next
Next

Schwartz Reisman Institute announces new faculty affiliates for 2024-25