How the evaluative nature of the mind might help in designing moral AI
How can human morality be incorporated into the design of artificial agents? Scholars and practitioners interested in the design of artificial intelligence (AI) have long struggled with this issue, sometimes referred to as the alignment problem.
Julia Haas, a senior research scientist in the Ethics Research Team at DeepMind, proposes that one of the reasons for this struggle might be certain assumptions about the nature of the human mind. As Haas argues, we often assume the human mind has two fundamental functions: one epistemic, and one phenomenological. Epistemic functions are involved in reasoning and computational processes, while phenomenological functions are involved in emotional and affective processes. However, recent advancements in reinforcement learning and the philosophy of cognitive science suggest that this dichotomy fails to consider a third aspect of the mind: its fundamentally evaluative nature.
Evaluating rewards and values
What does it mean to claim that the mind has an evaluative nature? This insight foregrounds the mind’s fundamental involvement in the process of attributing rewards and values. Research in the field of cognitive science suggests that the ability to attribute value drives many of our decisions and cognitive abilities, such as visual fixation. This evidence supports the claim that the mind is evaluative in nature, and selects what to attend to based on its evaluations. What the mind might attend to can include concrete representations, like food or fire, but also abstract ones, like equality or justice. If we assume that human morality (e.g., the ability to differentiate between just and unjust) is supported by value attribution, then we might be able to transpose the capacity of artificial agents to learn about human morality.
Advancements in machine learning suggest that artificial agents can already be designed to operate in terms of rewards and values through reinforcement learning—an area of machine learning that focuses on how intelligent agents behave in a given environment, with the goal of maximizing a certain reward function. This implies, for example, that a successful reward function can allow intelligent agents to “know” when to sacrifice immediate rewards in order to maximize the total reward. In other words, machines may already be better than humans at practicing delayed gratification.
Reinforcement learning and mind design
In a recent presentation at SRI’s Seminar Series, Haas offered additional evidence that the design of AI systems is an iterative design process that requires expertise from different disciplines. In this case, advancements in machine learning informs theoretical developments in philosophy of cognitive science and neuroscience. Specifically, advancements in reinforcement learning contributed to the notions of rewards and values in mind design. Haas argues that once we gain new knowledge on the elements that are required in designing minds, this knowledge “can go back to computer scientists and offer novel insights into the design of artificial agents as well.”
What we learn through conceiving of the human mind as evaluative is that reinforcement learning has the potential to underwrite these much more complex decision-making capacities, and potentially extend it to moral questions. In this scenario, departing from a dichotomous (epistemology vs. phenomenology) conception of the mind and considering its evaluative nature can offer the opportunity to incorporate human phenomenological features into artificial agents.
If the evaluative nature of the mind is supported by empirical evidence, then we can revisit the long-standing assumption about whether artificial agents can have moral cognition. Haas’s seminar highlights the possibility for those involved in the design of artificial intelligence to take this direction. Whether the design of artificial intelligence should incorporate elements of morality is a different question.
Want to learn more?
Delve deeper into Julia Haas’ research through her 2020 paper “Moral gridworlds: A theoretical proposal for modeling artificial moral cognition,” in Minds and Machines, or her 2021 paper, “Reinforcement learning: A brief guide for philosophers of mind.”
Watch a recorded video panel featuring Haas, “The problem of cognitive ontology, implications for scientific knowledge,” from the Center for Philosophy of Science.
Learn more about SRI Director Gillian Hadfield’s recent research on the alignment problem.
About the author
Davide Gentile is a graduate fellow at the Schwartz Reisman Institute and a PhD candidate in the Department of Mechanical and Industrial Engineering at the University of Toronto. His research focuses on human interactions with artificial intelligence in sensitive yet safety-critical industrial domains.