How the evaluative nature of the mind might help in designing moral AI

 

In a recent SRI Seminar, Julia Haas explored a new conception of the human mind as fundamentally evaluative in nature. According to Haas, a senior research scientist in the Ethics Research Team at DeepMind, this insight could assist in designing expanded forms of artificial intelligence that incorporate moral questions.


How can human morality be incorporated into the design of artificial agents? Scholars and practitioners interested in the design of artificial intelligence (AI) have long struggled with this issue, sometimes referred to as the alignment problem.

Julia Haas, a senior research scientist in the Ethics Research Team at DeepMind, proposes that one of the reasons for this struggle might be certain assumptions about the nature of the human mind. As Haas argues, we often assume the human mind has two fundamental functions: one epistemic, and one phenomenological. Epistemic functions are involved in reasoning and computational processes, while phenomenological functions are involved in emotional and affective processes. However, recent advancements in reinforcement learning and the philosophy of cognitive science suggest that this dichotomy fails to consider a third aspect of the mind: its fundamentally evaluative nature.

Julia Haas

Julia Haas

Evaluating rewards and values

What does it mean to claim that the mind has an evaluative nature? This insight foregrounds the mind’s fundamental involvement in the process of attributing rewards and values. Research in the field of cognitive science suggests that the ability to attribute value drives many of our decisions and cognitive abilities, such as visual fixation. This evidence supports the claim that the mind is evaluative in nature, and selects what to attend to based on its evaluations. What the mind might attend to can include concrete representations, like food or fire, but also abstract ones, like equality or justice. If we assume that human morality (e.g., the ability to differentiate between just and unjust) is supported by value attribution, then we might be able to transpose the capacity of artificial agents to learn about human morality.

Advancements in machine learning suggest that artificial agents can already be designed to operate in terms of rewards and values through reinforcement learning—an area of machine learning that focuses on how intelligent agents behave in a given environment, with the goal of maximizing a certain reward function. This implies, for example, that a successful reward function can allow intelligent agents to “know” when to sacrifice immediate rewards in order to maximize the total reward. In other words, machines may already be better than humans at practicing delayed gratification.

Reinforcement learning and mind design

In a recent presentation at SRI’s Seminar Series, Haas offered additional evidence that the design of AI systems is an iterative design process that requires expertise from different disciplines. In this case, advancements in machine learning informs theoretical developments in philosophy of cognitive science and neuroscience. Specifically, advancements in reinforcement learning contributed to the notions of rewards and values in mind design. Haas argues that once we gain new knowledge on the elements that are required in designing minds, this knowledge “can go back to computer scientists and offer novel insights into the design of artificial agents as well.”

What we learn through conceiving of the human mind as evaluative is that reinforcement learning has the potential to underwrite these much more complex decision-making capacities, and potentially extend it to moral questions. In this scenario, departing from a dichotomous (epistemology vs. phenomenology) conception of the mind and considering its evaluative nature can offer the opportunity to incorporate human phenomenological features into artificial agents.

If the evaluative nature of the mind is supported by empirical evidence, then we can revisit the long-standing assumption about whether artificial agents can have moral cognition. Haas’s seminar highlights the possibility for those involved in the design of artificial intelligence to take this direction. Whether the design of artificial intelligence should incorporate elements of morality is a different question.


Davide Gentile

About the author

Davide Gentile is a graduate fellow at the Schwartz Reisman Institute and a PhD candidate in the Department of Mechanical and Industrial Engineering at the University of Toronto. His research focuses on human interactions with artificial intelligence in sensitive yet safety-critical industrial domains.


Browse stories by tag:

Related Posts

 
Previous
Previous

Algorithms and the justification of power

Next
Next

Inaugural SRI Faculty Fellows build bridges between disciplines and forge new areas of research