Our weekly SRI Seminar Series welcomes Rohan Alexander for a special in-person talk that will also be broadcast online. Alexander is an assistant professor at the University of Toronto jointly appointed in the Faculty of Information and the Department of Statistical Sciences, an assistant director of CANSSI Ontario, a senior fellow at Massey College, a faculty affiliate at the Schwartz Reisman Institute for Technology and Society, and a co-lead of the Data Sciences Institute’s Thematic Program in Reproducibility.
Alexander’s research investigates how to develop workflows that improve the trustworthiness of data science. His recently published book Telling Stories With Data (Routledge, 2023) argues that a trustworthiness revolution is needed in data science, and proposes a view of what this could look like.
Talk title:
“Improving reproducibility in quantitative social sciences: A simulation-based workflow enhanced with large language models”
Abstract:
Despite improvements such as replication packages and pre-registration, reproducibility—a foundation of scientific knowledge—remains a challenge in quantitative social sciences such as economics and political science. One fundamental issue is the interaction between a dependence on code, that is not typically written by software engineers, to clean, prepare, and model data that are especially complicated, and may not be able to be shared. In this talk I will begin by discussing some of the issues that undermine reproducibility in economics and political science, drawing on case studies to highlight common issues such as data availability, model complexity, code errors, and methodological transparency. I will then discuss my attempt to establish an improved workflow for quantitative social sciences. This leverages simulation-based approaches from statistics, and test-driven code practices from software engineering. The use of realistic simulated datasets enables more robust testing and validation of scientific conclusions in quantitative social sciences. Finally, I will introduce how I am integrating Large Language Models into this workflow to address cultural issues that might otherwise slow down the adoption of better practices. The result is more credible quantitative social science research which will enable us to better learn something new about the world.
Venue:
Rotman School of Management, University of Toronto, Room LL1030.
Entrance: 95 St. George Street, Toronto, ON M5S 3E6
Seminar will be broadcast live via Zoom (register for link).
About Rohan Alexander
Rohan Alexander is an assistant professor at the University of Toronto, jointly appointed in the Faculty of Information and the Department of Statistical Sciences. He is also the assistant director of CANSSI Ontario, a senior fellow at Massey College, a faculty affiliate at the Schwartz Reisman Institute for Technology and Society, and a co-lead of the Data Sciences Institute’s Thematic Program in Reproducibility.
Alexander’s research investigates how to develop workflows that improve the trustworthiness of data science. He is particularly interested in the role of testing in data science. Alexander’s recently published book, Telling Stories With Data (Routledge, 2023), argues that a trustworthiness revolution is needed in data science, and proposes a view of what this could look like. His teaching helps students from a wide range of backgrounds learn how to use data to tell convincing stories.
Alexander is an associate editor of the Journal of Statistics and Data Science Education and a co-organizer of the Toronto Data Workshop and Toronto Workshop on Reproducibility. He holds a PhD in economics from the Australian National University, where his research focused on economic history.
About the SRI Seminar Series
The SRI Seminar Series brings together the Schwartz Reisman community and beyond for a robust exchange of ideas that advance scholarship at the intersection of technology and society. Seminars are led by a leading or emerging scholar and feature extensive discussion.
Each week, a featured speaker will present for 45 minutes, followed by an open discussion. Registered attendees will be emailed a Zoom link before the event begins. The event will be recorded and posted online.