Unraveling the reward hypothesis: A peek into human and machine decision-making processes

 

In a session at the Schwartz Reisman Institute’s annual academic conference, computer scientist Richard Sutton (left) and philosopher of mind Julia Haas (right) discussed the legacy of Sutton’s reward hypothesis and the interplay between rewards, decision-making, and moral cognition with SRI Director Gillian Hadfield (centre). SRI Graduate Fellow Bilal Taha writes about the session’s key takeaways.


In an enlightening session during the Schwartz Reisman Institute’s annual academic conference, Absolutely Interdisciplinary, computer scientist Richard Sutton and philosopher of mind Julia Haas guided attendees through a deep dive into the world of the reward hypothesis.

The reward hypothesis, postulated by Sutton in 2004, states that “all of what we mean by goals and purposes can be well thought of as maximization of the expected value of the cumulative sum of a received scalar signal (reward).” In the context of reinforcement learning (a type of machine learning), it means that all of an artificial agent’s actions are driven by the goal of choosing actions that maximize rewards obtained from the environment. Many complex behaviours have emerged from artificial agents trained in this way. But how well does the reward hypothesis capture human behaviour? Can human goals and purposes be adequately explained as the maximization of reward? And can, or should, the reward hypothesis guide our thinking about normative decision-making for individuals and groups?

Haas, a senior research scientist in the ethics research team at DeepMind, and Sutton, professor of computer science at the University of Alberta and chief scientific advisor at the Alberta Machine Intelligence Institute (Amii), discussed these questions and presented intriguing ideas about how the reward hypothesis may play a pivotal role in understanding and modeling moral cognition, decision-making, emotions, and normative systems, ultimately expanding the implications of the reward hypothesis for both humans and machines.

 Watch a recording of the session on the reward hypothesis at Absolutely Interdisciplinary 2023.

Understanding the reward hypothesis: More than just pleasure or pain

 An interesting aspect of the session’s discussion revolved around distinguishing between reward and pleasure or pain. We often conflate the notion of reward with the sensation of feeling good or bad, but it's important to remember that while pleasure and pain often coincide with rewards, they are not the same thing. Pleasure and pain are biological signals of reward and value. However, sometimes these signals and the actual reward diverge, which is where the reward hypothesis comes into play. It allows us to create more accurate models for understanding these phenomena. For example, an athlete might endure physical pain during training to reap the reward of achieving their goals later on. Here, the actual reward (achieving goals) diverges from the immediate physical sensations (pain).

“While pleasure and pain often coincide with rewards, they are not the same thing. Pleasure and pain are biological signals of reward and value. However, sometimes these signals and the actual reward diverge.”

Reward hypothesis in action: Moral cognition and normative systems

Sutton and Haas explored the reward hypothesis in the context of moral cognition. The idea is that our sense of morality, our understanding of right and wrong, is also governed by the rewards we associate with these judgments. For example, we might take a certain action—say, picking up after ourselves rather than leaving garbage in public—not just because it is morally right, but also because we would be rewarded by positive social validation and acceptance from others around us.

This illustrates that while we might think moral judgments are absolute, they can vary greatly between different cultures and societies, indicating the complex interplay of reward systems at work. Sutton and Haas then shed light on the role of reward systems in shaping normative systems—societal 'rules' or standards. These systems continually adapt and evolve based on collective values and rewards. Consider how societal views on various issues, such as environmental conservation or gender equality, have shifted over time. These changes can be seen as a reflection of an evolving reward system.

 

Richard Sutton speaks at Absolutely Interdisciplinary 2023. Photo by Jamie Napier.

 

Rational intelligence: Beyond the pure reward function

Sutton and Haas then introduced a thought-provoking perspective on the role of so-called “rational intelligence.” Instead of merely following reward functions, rational intelligence could conceivably guide an agent’s goal-setting. This means agents could be capable of setting goals that might not immediately maximize reward but would serve a more significant, long-term purpose. This mechanism allows humans to align our immediate actions with long-term objectives.

“While we might think moral judgments are absolute, they can vary greatly between different cultures and societies, indicating the complex interplay of reward systems at work. Sutton and Haas shed light on the role of reward systems in shaping normative systems—societal 'rules' or standards.”

Motivations: The fuel for our actions

According to Sutton and Haas, our desires and preferences are often tied to sensations such as pleasure and pain, driving our actions. But these motivations are involuntary; we can't simply decide not to be motivated by something—for example, a person who likes pizza and derives pleasure from eating it cannot simply decide to dislike pizza. The rewards serve as external factors that shape our motivations, offering a compelling explanation of why we behave the way we do. Essentially, our motivations have a significant influence on our actions, and the reward hypothesis provides insight into this relationship.

Points of agreement and disagreement

Throughout their discussion, Sutton and Haas agreed on many points, like the fundamental role of the reward hypothesis in moral cognition and decision-making. However, their views diverged on certain topics such as the precise relationship between reward and pleasure or pain and the recognition of artificial intelligence agents as individuals. These differences highlight the vast and intricate world of the reward hypothesis, offering perspectives from different academic disciplines and areas of inquiry on this fascinating concept.

Overall, Sutton and Haas's discussion on the reward hypothesis painted a comprehensive picture of the interplay between rewards, decision-making, and moral cognition. This line of thought can encourage us to reflect upon our internal systems of reward and value, offering us a richer understanding of the actions, decisions, and motivations of both humans and artificial agents.

Watch the full session:


Bilal Taha

About the author

Bilal Taha is a PhD candidate in the Edward S. Rogers Sr. Department of Electrical and Computer Engineering at the University of Toronto, a graduate fellow at the Schwartz Reisman Institute, an affiliate with the Vector Institute, and a member of the Temerty Centre for AI Research and Education in Medicine. His research explores the intersection of signal and image processing, and machine learning for biometric and affective computing applications. Currently, he is working on medical biometric models with a goal of developing algorithms that leverage biometric modalities to ensure the security and fairness of the users.


Browse stories by tag:

Related Posts

 
Previous
Previous

Women in AI speaker series offers insights into the importance of diversity in tech

Next
Next

How should we regulate frontier AI models?