How can researching normativity help us align AI with human values?

 
What is the alignment problem and how can we encourage the development of human-aligned AI? What is normativity and how do humans channel appropriate behaviour?  SRI Director Gillian Hadfield explores these issues and their implications in her lates…

What is the alignment problem and how can we encourage the development of human-aligned AI? What is normativity and how do humans channel appropriate behaviour? SRI Director Gillian Hadfield explores these issues and their implications in her latest research.


Fears about the conflict between humans and artificial intelligence have been front and center in the popular imagination for decades, from 2001: A Space Odyssey’s HAL in 1968 to contemporary imaginings of killer robots. And our fears aren’t entirely unfounded.

The very real harmful effects of AI tools are all too common. In 2016, for example, an algorithm in Florida misclassified African-American defendants in the criminal justice system as “high risk” at almost twice the rate as white defendants.

While we may be imagining the impending attack of killer robots, “the bigger issue is alignment,” says Schwartz Reisman Director Gillian K. Hadfield, a scholar of law and technology who specializes in the governance and regulation of AI.

Alignment refers to the ideal that an AI’s actions should align with what humans would want. How can we make sure the machines we build do what we intend them to do and achieve our desired outcomes?

“This is really a motivator for what we’re trying to build at Schwartz Reisman—although it’s certainly not the only thing. But this particular topic is central to my personal research,” said Hadfield at the Schwartz Reisman weekly seminar on September 16, 2020.

➦ Missed this event? Watch the recording.

“The notion of normativity is key to how AI can function appropriately in its environment,” said Hadfield in her talk.

“The word ‘appropriately’ is, of course, a normative term. Let’s say we have a visitor coming from outside our human normative system, maybe from Venus. That visitor would wonder, ‘what is it that these folks here on earth think is OK or not OK?’”

We humans regularly devise social systems to channel our behaviour—to ensure people do things that are OK and not do things that are not OK. This could be anything from a formal negative consequence like a fine to something like disapproval or a raised eyebrow. At the extreme end, someone behaving in a way that’s considered inappropriate might be imprisoned or entirely excluded from their community.

How could we apply behaviour channeling to AI to ensure it does what we want? How might we channel machine or robot behaviour the way we channel our own?

Hadfield thinks we need to study normativity itself as a phenomenon in order to figure out how the growth and evolution of our normative systems—what is considered OK and not OK—can be applied to machines.

But this is hardly a simple task.

“Lawyers and philosophers know well that you can never really specify completely and sufficiently any rule in order to say what is the right or a desired tradeoff in any situation,” says Hadfield. “Rules are incomplete.”

This means we can’t really ‘embed’ rules about values in machines.

select_9868.jpg

“We can see that normativity is fundamental to human intelligence,” says Hadfield. “So, normativity is central to building artificial intelligence as well.”

Why?

Because values are not static, finite objects or pieces of information. They are constantly changing, evolving, growing, and transforming within—and in response to—the system in which they exist.

“Values are the equilibria of a normative system,” says Hadfield. “So, we need to align machines with the equilibria of human normative systems.”

In fact, Hadfield sees some interesting parallels between evolutionary theory and human normativity.

“We have these leaps throughout the history of evolution to increasing levels of complexity,” she says. “And this is really central to our enormous human capacity for adaptation.”

If our organisms and DNA have adapted over time to changing conditions and needs, our values do so too.

For example, as our societies have grown over millennia in complexity, we have been repeatedly confronted with ambiguity: things that once seemed OK or not OK may no longer be as clearly classified in light of a new historical context, and new technologies, new needs for human communities.

“There are things we didn’t have to classify before that we have to classify now,” says Hadfield.

This is where our institutions—such as law and courts—come in. “Institutions are in some sense the rules of the game,” says Hadfield. “And they have associated enforcement mechanisms that provide an incentive structure to direct people’s behaviour. Institutions can resolve ambiguity. They can say what is or is not OK.”

But there are more informal types of classification and enforcement mechanisms—things like social norms and culture. In these realms, we don’t know how behaviour is classified until we see the interaction of lots and lots of agents behaving together.

For example, a rule may tell you that a red light means stop, but if you see lots of cars running the red light in the middle of the night when no traffic is around, you might come to the conclusion that this behaviour is OK.

These kinds of less formal rules and norms tend to have what are called “emergent properties.” Simply put, an emergent property is something that comes out of a dynamic or interaction that is different than the sum of its parts.

This is where Hadfield’s work has investigated a concept she calls “silly rules.” These are rules that don’t seem to have any direct payoff to individuals within a group, but the stability and robustness of the group as a whole improves if group members follow the rules.

“For example, wearing a mask nowadays is not silly at all,” says Hadfield. “But at this time last year, if I had walked outside with a mask on, I would have been violating a rule that says we don’t wear masks like that on a regular basis in public.”

“It’s tricky to see which rules are important.”

The bottom line is that “it’s tricky to see which rules are important,” says Hadfield. Would the world be so different if we wore more of one colour than another? Or if we ate with an instrument that looks like a fork, or different from a fork?

Here, it’s important to remember that rules are not just rules; they are also deeply symbolically meaningful to communities. As ethnographers have shown, rules are not always related to achieving an objective. They sometimes have religious significance, or other kinds of symbolic significance.

This is where Hadfield’s research hypothesis about aligning AI with human values comes in: she and her collaborators have conducted experiments in what is called a “multiagent reinforcement learning environment”—and environment in which agents discover reward and punishment on their own, rather than being told what actions will yield rewards or punishments. If you’re not familiar with this concept, don’t worry. The findings are what is most interesting.

It seems that groups of agents that have ‘silly’ rules of the kind Hadfield identifies are able to  “ride out the bumps in establishing their equilibrium more easily,” says Hadfield. These agents can more easily get answers to the question: “Are we still playing the game this way, or what?” And these groups can also figure out shifts in the equilibrium more quickly.

“The idea is that agents are constantly getting information about the state of the system they’re operating in. This helps agents make decisions more appropriately,” says Hadfield.

These groups then have a sort of evolutionary advantage, in that they can adapt to new situations better.

Ultimately, all of our normative systems are about the tension between stability and adaptation. Rules give us stability, but if the rules are not working, how do we change the rules?

“We can see that normativity is fundamental to human intelligence,” says Hadfield. “So, normativity is central to building artificial intelligence as well.”

Want to know more?


Browse stories by tag:

Related Posts

 
Previous
Previous

Machine learning makes uncertainty visible. Can it help reduce false denials of refugee claims?

Next
Next

We’re hiring! Join SRI as a postdoctoral fellow in computational behavioural modeling and analysis