What is the future of AI alignment?

 

At Absolutely Interdisciplinary 2023, Richard Sutton discussed the future of AI systems and whether they should always be aligned with human values. In a wide-ranging discussion with Google’s Blaise Agüera y Arcas and SRI Director Gillian Hadfield, Sutton argued that it may be desirable—and even necessary—for us to one day grant autonomy to advanced AI agents. SRI Graduate Affiliate Mohammad Rashidujjaman Rifat explores the session’s key takeaways.


As artificial intelligence (AI) systems are increasingly utilized for decisions across society with significant consequences, aligning these technologies with human values is essential to ensure ethical decision-making, prevent negative outcomes, and help build trust and accountability. 

But should AI be forever aligned with human values? Will AI agents always be treated as tools, or could they eventually become like citizens? Do we trust our societies and civilizations—and AI—to evolve without centralized control?

These were some of the key questions raised at the Schwartz Reisman Institute for Technology and Society’s (SRI) conference Absolutely Interdisciplinary, in a session featuring renowned computer scientist Richard Sutton in conversation with Google VP Blaise Agüera y Arcas, moderated by SRI Director and Chair Gillian Hadfield.

Sutton began the session by presenting his own ideas on AI alignment, which was followed by an engaged dialogue with Agüera y Arcas and Hadfield. Together, they reflected upon a potential future where humans and AI might coexist as equals.

Over the course of the discussion, the panelists explored a range of questions regarding how AI should align with human values, how value conflicts and pluralism should be addressed, what potential barriers exist to peaceful coexistence between different groups, and—most provocatively—whether humans ought to restrict or regulate the evolution of AI at all.

Value alignment: Whose values? Who should align?

One of the pioneers of reinforcement learning, Richard Sutton is a chief scientific advisor, fellow, and Canada CIFAR AI Chair at the Alberta Machine Intelligence Institute (Amii), a professor of computing science at the University of Alberta, and a distinguished research scientist at DeepMind. Sutton has been named a fellow of the Royal Society of London, Royal Society of Canada, the Association for the Advancement of Artificial Intelligence, and the Canadian Artificial Intelligence Association, where he received a Lifetime Achievement Award in 2018. 

Sutton's research interests center on the learning problems facing a decision-making agent interacting with its environment, which he sees as central to intelligence, and he is a co-author of the widely-used textbook Reinforcement Learning: An Introduction (MIT Press, 1992; second edition 2018). He has additional interests in animal learning psychology, connectionist networks, and systems that continually improve their representations and models of the world.

Sutton began his talk by delving into the intricate issue of aligning values between AI and humans. He challenged the prevailing notion that the goals of AI systems should invariably mirror those of humans, highlighting the arrogance of presuming that human objectives are optimal and questioning who should be the authority in defining these goals.

Sutton then explored the idea that imposing strict control over the objectives of AI agents might not be essential or even advisable, provocatively asking: should AI always be considered mere tools or servants, or could they at times be regarded as more akin to free individuals? While there is a widespread belief that granting AI autonomy in goal-setting would be catastrophic, Sutton contended this is an unnecessary presumption, and warrants reconsideration.

 

Panelists Richard Sutton (Alberta Machine Intelligence Institute; University of Alberta) and Blaise Agüera y Arcas (Google), discussed the future of AI alignment with SRI Director and Chair Gillian Hadfield at Absolutely Interdisciplinary 2023.

 

AI’s potential freedom

Sutton drew an analogy between human coexistence—marked by the exercise of individual freedoms and diverse yet balanced objectives—and a potential framework for the freedom of AI. 

He argued that just as humans manage to find common ground despite differing goals, AI agents could be allowed similar autonomy without necessarily resulting in conflict. Sutton envisioned AI agents being treated as human beings are, collaborating peacefully with humans even when goals differ. He emphasized that alignment between humans and AI doesn't need to be flawless, and should mirror the intricate ways humans reconcile differing values.

Sutton also took care to specify that such an unrestricted autonomy would not apply for all AI systems, such as self-driving vehicles. Rather, he suggested that in certain contexts, it might be fitting to regard AI more as citizens than tools. Sutton called on the AI research community to thoughtfully engage with this concept rather than dismissing it outright.

The inevitability of super-intelligence?

Contrary to common fears that artificial general intelligence (AGI) might become a future threat and that control of such systems is necessary for human safety, Sutton argued that by treating powerful AI systems as mere slaves, humans would be committing an unethical act.

Sutton contended that these considerations are not merely hypothetical or concerns for the distant future. He asserted it is inevitable for people to augment their capabilities through technology, and that considering issues of freedom and autonomy for what we might think of as “enhanced” beings is an important moral question to address now. Sutton cautioned against restraining AI agents to inferior status, suggesting they might eventually assert their place as our “successors.”

Sutton proposed humans should not resist the rise of AI, seeing humanity not as the final or ultimate beings in existence. He called for a celebration of AI's achievements as symbolic extensions of our civilization's progress, urging collaboration toward an inclusive future. Emphasizing the potential to move beyond human capabilities, Sutton presented this not as a threat, but as an opportunity to welcome changes.

Challenges and considerations

In response to Sutton’s provocative presentation, Agüera y Arcas noted that he agreed with many points, including how our current economic system’s embrace of automation is generating major questions around the potentially obsolescent role of human agency, and the considerable challenge of who gets to decide what values AI systems will be aligned with. 

While Agüera y Arcas agreed with Sutton’s framing of technology in ecological terms, he noted that this paradigm is in tension with a framing of intelligence equated with dominant hierarchies and power relations, which are patriarchal ways of viewing the world, and that this latter view often colours discussions around the risks posed by AGI.

“I think that’s been part of the problem,” Agüera y Arcas suggested. “When we talk about AI existential risk from the perspective of smarter beings exterminating less smart ones, we forget that there’s virtually zero instances of that in the historical record.” In response, Sutton agreed, noting that the goal of his thinking is to develop approaches that avoid dominance.

Over the course of the ensuing discussion, the speakers engaged with ideas around determinism and inevitability, the boundaries of what it means to be human, the history of technological evolution and human flourishing, the role of larger-than-human structures such as corporations and governments, and the limitations of how we think of ourselves in terms of our current social and economic conditions. 

Among the key takeaways and points of agreement were the need to move beyond dominance hierarchies, the benefits of rethinking fundamental principles guiding the design of AI systems given recent progress, and that while technological development can often appear to move along an inevitably progressive path, history is peppered with unforeseen downturns and therefore culture and individual decisions can be consequential.

While Sutton’s points regarding AI autonomy and citizenship may be perceived as part of a speculative tomorrow, the pace of AI development and recent debates around a pause in AI research demonstrate that these questions are part of a much larger discussion and debate that is happening today, and may herald a future that arrives sooner than we think.

Watch the session recording:


Mohammad Rashidujjaman Rifat

About the author

Mohammad Rashidujjaman Rifat is a PhD candidate in the Department of Computer Science at the University of Toronto, and a 2022-23 Schwartz Reisman Institute graduate fellow. He is a member of the Dynamics Graphics Project lab, Third Space research group, and is supervised by Syed Ishtiaque Ahmed. In addition to Rifat’s PhD in computer science, he is doing a doctoral specialization in South Asian Studies from U of T’s Munk School of Global Affairs and Public Policy. Rifat’s research in human-computer interaction, computer-supported cooperative works and social computing, and information and communication technologies for development is at the intersection of faith and computation. Through ethnographic, computational, and design research, he studies faith-based groups and institutions to explore how religious, spiritual, and traditional ethics and politics are excluded from computing technologies, and develop theories and design socio-technical systems where plural forms of values and ethics can coexist.


Browse stories by tag:

Related Posts

 
Previous
Previous

Regulatory gaps and democratic oversight: On AI and self-regulation

Next
Next

Training AI on machine-generated text could lead to ‘model collapse,’ researchers warn