Our weekly SRI Seminar Series welcomes Owain Evans, a research associate at Oxford University’s Future of Humanity Institute. Evans’ research interests are in AI safety and the future of AI, with a current focus on truthful and honest AI.
In this talk, Evans will present recent work on defining and measuring “truthfulness” in the context of large language models, including their calibration, and their ability to forecast world events. These topics will be considered in relation to the reduction of epistemic harms from AI and the problem of value alignment in the context of artificial general intelligence.
Talk title:
“Truthful language models and AI alignment”
Abstract:
Like it or not, language models will play an increasingly central role in how people learn about the world and communicate to others. This poses a challenge. Can we create models that are factually accurate, calibrated (e.g., avoiding overconfidence), and reliably non-manipulative? This kind of model would help individuals and society to form more accurate beliefs and to avoid misinformation. It would also have the potential to help with the problem of AGI alignment or AGI risk (Bostrom 2015, Russell 2019).
I will present recent work on defining and measuring "truthfulness" for language models, on calibration, and on using models to forecast world events. I will discuss connections to reducing epistemic harms from AI and to the problem of AGI alignment.
Recommended readings:
O. Evans, et. al., “Truthful AI: Developing and governing AI that does not lie,” arXiv preprint, 2021.
S. Lin, J. Hilton, O. Evans, “TruthfulQA: Measuring How Models Mimic Human Falsehoods,” arXiv preprint, 2021.
A. Zou, et. al., “Forecasting Future World Events with Neural Networks,” arXiv preprint, 2022.
About Owain Evans
Owain Evans is a research associate at the Future of Humanity Institute at Oxford University. His research interests are in AI safety and the future of AI. He received his PhD from MIT. In 2019, he was a visiting scholar in the CHAI group at UC Berkeley. He is on the board of directors at Ought, a non-profit lab that created the AI research assistant Elicit. He has worked on preference learning, reinforcement learning, forecasting, and philosophical questions relating to AI. His recent work aims to understand truthfulness and honesty for AI models.
About the SRI Seminar Series
The SRI Seminar Series brings together the Schwartz Reisman community and beyond for a robust exchange of ideas that advance scholarship at the intersection of technology and society. Seminars are led by a leading or emerging scholar and feature extensive discussion.
Each week, a featured speaker will present for 45 minutes, followed by an open discussion. Registered attendees will be emailed a Zoom link before the event begins. The event will be recorded and posted online.