AI provides new insights into social learning

 
Colourful, different-shaped blocks are stacked into a configuration to build a staircase.

In a session at Absolutely Interdisciplinary 2022 on “Natural and Artificial Social Learning,” SRI Associate Director Sheila McIlraith, SRI Faculty Affiliate and U of T Professor of Philosophy Jennifer Nagel, and Senior Research Scientist Natasha Jaques of Google Brain explored how social learning can benefit a wide range of agents, including humans and AI systems, and how insights from philosophy and computer science can illuminate each other.


Social learning, a remarkable phenomenon observed in animals and more profoundly in humans, is the process of gaining knowledge by observing, interacting with, or imitating others. While social learning is incorporated widely in the behaviour of various species in the animal kingdom, its application by humans is exceptionally powerful, as humans are capable of flexibly and deliberately sharing what they know.

At the Schwartz Reisman Institute for Technology and Society’s annual academic conference, Absolutely Interdisciplinary, a session entitled “Natural and Artificial Social Learning” moderated by SRI Associate Director Sheila McIlraith explored how research in artificial intelligence (AI) and reinforcement learning (RL) can provide a better understanding of social learning and the complex mechanisms that underlie communication and cooperation, as well as how social learning principles can improve the design of AI agents. The session’s panelists, Jennifer Nagel (University of Toronto) and Natasha Jaques (Google Brain), discussed how these insights are being currently deployed to build better artificial agents across a wide range of potential applications, from simulated models to real-world contexts such as self-driving vehicles, and how AI can be used as a testbed for exploring how social learning operates in the natural world.

 
Sheila McIlraith, Natasha Jaques, and Jennifer Nagel

Clockwise from top left: SRI Associate Director Sheila McIlraith, Natasha Jaques, and Jennifer Nagel discuss social learning at Absolutely Interdisciplinary 2022.

 

Natural social learning and its mysteries in humans

How is social learning different from individual learning? Jennifer Nagel, a professor in the Department of Philosophy at the University of Toronto, noted that the two kinds of learning can often be applied to similar problems, and with similar results. For example, some species, such as meerkats, work as teams to alert each other to approaching predators, whereas the yellow mongoose—a very similar animal—forages in solitude, relying only on its individual senses. These two ways of detecting dangers seem to work equally well in the Kalahari desert, where these animals live: relying on others doesn’t necessarily outperform going it alone. And yet, for some problems, it seems that the switch from individual to social learning makes an enormous difference.

Humans seem to have unlocked a class of problems where social learning makes an enormous difference. As Michael Tomasello writes in Becoming Human, “virtually all of humans’ most remarkable achievements—from steam engines to higher mathematics—are based on the unique ways in which individuals are able to coordinate with one another cooperatively, both in the moment and over cultural-historical time.”

Jennifer Nagel

Jennifer Nagel

The emergence of humanity’s unique social learning strategies is hard to trace. We find clear evidence of anatomical development in the fossil record, Nagel observed, but the prehistoric development of new ways of thinking is harder to map.  Some clues are available in comparisons between ourselves and our closest animal relatives. Perhaps one of the most intriguing aspects of social learning is the positive correlation between the size of primate social groups and their cognitive capacity. What is the causal dynamic between these features? Does a richer social environment increase cognitive capacity, or is it the other way around? While the effects of living in a rich social environment and having a larger brain size are closely intertwined in animals, Nagel believes that AI models could help to distinguish and clarify the relationship between the two. For logistical and ethical reasons, we can’t experiment with increasing or decreasing social group size and cognitive capacity in humans or other animals, but we can easily vary these factors independently in multi-agent reinforcement learning models.

In addition to AI helping us understand social learning from an evolutionary perspective, it might also help us to better understand the distinct mechanisms that humans use to learn from others. For example, humans have the capacity to apply selective social learning: we choose who to learn from. Children have been shown to learn new vocabulary more readily from knowledgeable agents than from those who have made mistakes in naming familiar items. What drives us to engage in social learning so strategically, and how do we represent knowledge itself? Nagel suggested that we might be “too steeped in our own social learning and natural language” to decipher the mechanisms that drive our social learning capabilities from within. Multi-agent reinforcement learning, however, could give us independent insight into how these mechanisms work, and the conditions under which they can be expected to outperform solitary learning.

Social reinforcement learning

How can research in reinforcement learning and social learning benefit each other? Natasha Jaques, a senior research scientist at Google Brain, described how her research investigates social motivations through the lens of multi-agent reinforcement learning to develop AI agents with social learning capabilities.

A typical multi-agent reinforcement learning problem can be described as a system in which agents interact with the environment and each other to accomplish certain tasks. Imagine a two-dimensional space where agents must find apples and get rewards. Agents can only go up, down, left, and right. Their goal is to maximize their expected future reward until the end of the episode. At any time, an agent should choose an action that maximizes its reward function and help it to reach its goal. Unlike other areas of machine learning which build models to make a single decision (like detecting a cat in an image), RL-based agents make a sequence of decisions over the course of the whole episode to learn to navigate this environment. This setting resembles real-world social scenarios where humans and animals live and interact with each other and the environment over time.

Natasha Jaques

Natasha Jaques

In this example, agents can either act independently and look for apples while disregarding information from other agents, or they can learn from each other and coordinate. Jaques demonstrated that by embedding an extra incentive into the model to encourage social learning—one which rewards agents who exert a causal influence on the actions of other agents—agents learn to cooperate by communicating valuable information, such as the presence of apples in certain locations. However, whether agents influence each other positively through beneficial actions or hinder each other through negative actions depends on the environment. An agent could stand in another agent’s way and make them change their direction, but this could distract them from their goal.

Jaques noted that this type of limited communication resembles what bees do in their own forms of social learning. In a more sophisticated setting, agents learn to generate certain truthful signals for communication. This is similar to natural language communication: agents can now communicate their next step with others, and are rewarded based on the influence of their message on other agents’ actions. If an agent does not tell the truth, then others will learn about this, and stop being influenced. Similar to a naturalistic setting, agents are free to ignore information from others if they decide the message does not contain valuable information. Results of this multi-agent setting show that better listeners, or agents that are more influenced, tend to have higher rewards. On the other hand, in order to become influential, agents must learn to tell the truth, leading the entire system to reach a state of coordination and communication where agents benefit from both signalling useful information and being influenced by the signals of other agents.

Selective social learning and preference values

One thing that differs between a simulated multi-agent system and a real-world social setting is that not everyone is a good influencer. Describing the example of self-driving cars, Jaques noted AI systems “should be learning that not every car on the road is a good driver, so you shouldn't be indiscriminately copying actions of every other agent.” This is an example where selective social learning is more beneficial than mere imitation.

Selective social learning can be implemented in a multi-agent system by having agents make predictions about their next observation and learning what to expect other expert vs. non-expert agents will do next. This is different from imitation, as now agents learn from not everyone, but only from those who make the best actions. The implementation of this more sophisticated framework can become even more similar to a real case of social learning by incorporating preference values.

In their PsiPhi-learning framework, Jaques and her colleagues combine social learning with preference values. Each agent in this framework can adapt its policy based on a new set of preferences. Agents would gauge how much payoff they would receive if they acted similarly to other agents according to their own preferences: If I (agent A) acted like agent B, would I collect more red apples, which I prefer over green apples? If agents learn about others’ actions successfully, then they can selectively imitate those who increase their reward. Increasing the reward would then allow the agents to interact more efficiently with the environment, update their preferences and learn more about others. This circular dynamic that is a result of selective social learning would elevate the entire system to a more optimal and coordinated state.

What Jaques shows in her research is that modeling social learning into an artificial multi-agent system can be a powerful mechanism that leads to the emergence of coordination and cooperation between the agents. In the context of a multi-agent environment, as Nagel emphasized, it is always beneficial to have more knowledge about your environment and other agents. The more incentive agents have to engage in social learning, the more cooperative behaviours would emerge. Implementing diverse types of social learning and uncovering how they produce high rewards is a fascinating direction that both Nagel’s and Jaques’ research show promise for, and demonstrate how the fields of epistemology and computer science can illuminate each other with unique insights that transcend the limits of their discipline.

Watch the full session:


Aida Ramezani

About the author

Aida Ramezani is a PhD student of computer science at the University of Toronto, under the supervision of Professor Yang Xu. Ramezani is interested in developing computational methods for unraveling the associations between moral values and natural language. In her research, she uses natural language processing to explore how changes in morality can be inferred from the changes in language usage, and how language models can be aligned with changes in moral values.


Browse stories by tag:

Related Posts

 
Previous
Previous

Evolutionary biology offers new perspectives on designing AI

Next
Next

New AI Audit Challenge seeks designs for detecting bias in AI systems