What do we want AI to optimize for?
In the summer of 2015, Silviu Pitis had just graduated from Harvard Law School and was working at a New York firm when he found himself overwhelmed by repetitive administrative tasks. With a desire to make his work more efficient, Pitis enrolled in a course on machine learning and neural networks to explore whether artificial intelligence (AI) could automate parts of his job. The exploration opened his eyes to a whole new realm of computing capabilities.
“Before I saw what neural networks could do, I always felt that human intuition was missing from computer programming,” Pitis reflects.
“The reason why AI had previously seemed impossible is because humans aren’t rule-based—we think in approximate terms. But if you combine systems of approximate inference with logic-based systems, they can do powerful things.”
That realization sparked Pitis’s shift away from law and towards a career in computer science. After earning a master’s degree from the Georgia Institute of Technology, he began a PhD at the University of Toronto under the supervision of Jimmy Ba to further his explorations of deep learning. Now in his final year, Pitis’s work has been supported by a scholarship from Canada’s Natural Sciences and Engineering Research Council (NSERC), a Vector Institute research grant, and a Schwartz Reisman Institute fellowship. His papers have received awards at several AI conference workshops, and he has collaborated on research with colleagues at Microsoft, Google, NVIDIA, and Stanford University.
Pitis’s research draws on aspects of decision theory to study how the principles of reward design for reinforcement learning (RL) agents are formulated. Decision theory is a methodology often used in economics and some areas of philosophy concerned with figuring out how and why agents evaluate and make certain choices.
Pitis’s work has also explored new approaches to using causal modeling—a process that aims to create models of relationships of causality—for improving data and helping RL systems evolve more effectively over time. His current work is focused on understanding how large language models (LLMs) make decisions by seeking to better understand their implicit assumptions.
Most recently, Pitis received a prestigious OpenAI Superalignment Fast Grant, valued at USD $150,000, to support his research on aligning AI systems with human values. His project will explore a pressing question: how do incomplete specifications of choice alternatives and implicit assumptions made by human supervisors affect the decisions made by an AI system?
As Pitis found in a recent paper which will be presented at the 2024 Conference on Neural Information Processing Systems (NeurIPS) this coming December, models can perform significantly better when context is integrated into their preference evaluations. This nuanced approach could play a vital role in making AI systems more reliable, especially as they are deployed in complex, real-world applications—and OpenAI’s latest o1 model provides a concrete example of how leveraging context can yield improved results.
“Before I saw what neural networks could do, I always felt that human intuition was missing from computer programming. The reason why AI had previously seemed impossible is because humans aren’t rule-based—we think in approximate terms.”
Questions of aligning AI systems with human values are a core aspect of what drew Pitis to the field of computer science. While such challenges have gained much public attention in the last year—as various open letters (see below) signed by leading experts in the field like Geoffrey Hinton received widespread discussion in the media—the topic has been a motivating one for Pitis since the start of his PhD, at a time when AI safety concerns were not as prevalent.
Recent open letters signed by leading experts on the risks posed by advanced AI systems:
“Managing extreme AI risks amid rapid progress,” Science, 2024.
“Statement on AI Risk,” Centre for AI Safety, 2024.
“My primary research question has always been the same,” says Pitis. “What objective function do we want AI to optimize for? If we aggregate values from society, what weights do we use, and whose values?”
While the question might seem unanswerable, it nonetheless is an essential one to explore if we are to successfully build AI systems that benefit everyone. As researchers have demonstrated—including Schwartz Reisman Institute Faculty Affiliates Richard Zemel and Toniann Pitassi, whose seminal work played a role in creating the field of fairness in machine learning—it’s not possible to design an algorithm that can optimize for all definitions of fairness. As a result, choices must be made to determine what values AI systems will uphold.
As AI systems continue to develop and proliferate across society, the challenge of determining what values to optimize for is becoming increasingly pivotal. In many cases, developers and organizations seek to generate outcomes that can improve beyond historical biases in existing social structures and past mechanisms used for decision-making.
“We’re not trying to align agents with humans—we’re trying to align agents with what humans are trying to align to,” explains Pitis. “There’s an ideal in the back of our mind of humanity, and we don’t really know what it is, but we see the human almost as a proxy to it.”
“What objective function do we want AI to optimize for? If we aggregate values from society, what weights do we use, and whose values?”
Towards this objective, Pitis draws on theories of social choice, where properties like rationality and fairness can lead to conflict and unresolved dilemmas in individual and societal decision-making, to better understand how AI can balance diverse, and even opposing, values.
As part of his new project, Pitis will explore how AI systems make decisions in situations where they lack necessary information, making choices incomplete or unclear. In large-scale automated systems or language models, gaps in understanding data and the values underlying decisions can make it hard to assess whether AI systems are making aligned and ethical choices. An additional area of interest is understanding implicit assumptions—both those made by AI models and by human supervisors. By incorporating additional forms of context into AI systems and exploring models that can better handle complex, real-world decision-making, Pitis aims to improve the alignment and safety of AI systems, particularly as they take on more complex and influential roles in society.
“As we get to stronger agents and more complex decision-making tasks, we should demand that the objectives of the model’s internal systems be consistent,” says Pitis.
The road to solving AI alignment is uncertain, but Pitis’s research marks an ambitious venture in that direction. His focus on refining AI decision-making and ensuring that systems remain consistent with human values will be crucial as AI systems become more powerful and integrated into everyday life.