Explanation and justification in partial view AI models

 
A scale with different-sized balls balanced equally.

Calls for artificial intelligence to be “explainable” have been mounting for several years, leading computer scientists to provide accounts of what factors produce or influence the decisions of complex machine learning systems. But are such accounts the only kinds of explanations we need? At Absolutely Interdisciplinary 2022, Finale Doshi-Velez (Harvard University) and Boris Babic (University of Toronto) explored the differences between explanation and justification, and how insights from other legal and regulatory domains can help us build trustworthy systems.


Modern machine learning (ML) systems are often large and complex, sometimes containing billions of parameters, which can make it difficult to understand why they do what they do. This lack of transparency can raise concerns, especially when systems using complex forms of ML contribute to important decisions in fields like medicine or justice—such as what medical treatment a patient ought to receive, or whether someone pending a criminal trial ought to be granted bail.

In response to this, calls to make artificial intelligence (AI) systems more explainable and, in some jurisdictions, a “right to explanation” are starting to appear in proposed legislation governing AI. Organizations such as UNESCO, OECD, the Council of Europe, and the UN General Assembly have all adopted principles concerning explainable AI, while the EU’s General Data Protection Regulation and Artificial Intelligence Act and Canada’s newly-proposed Bill C-27 create rights to explanations for AI systems. These regulatory efforts have in turn influenced computer scientists to develop methods that provide accounts of the factors that produce and impact the decisions of an AI system. These explanations provide insight into how an AI system is responding to different inputs from a mathematical or statistical point of view.

But are explanations like these the only accounts we need from these systems? It’s not clear that everyone always needs, or can benefit from, an understanding of the mathematical parameters of a complex ML model. While AI developers building the system might benefit from this technical information, it might not help a doctor or patient using an AI system. Instead, what the doctors and patients often want to know is that the model is adequately tested and behaves reliably. In other words, they want to know that the system’s decisions are justified.

At the Schwartz Reisman Institute for Technology and Society’s conference Absolutely Interdisciplinary, panelists Finale Doshi-Velez (Harvard University) and Boris Babic (University of Toronto) discussed the implications of the call for explainable AI, the differences between explanation and justification when it comes to AI systems, and the insights we would gain if we were to build trustworthy and accountable AI from our knowledge of legal and regulatory domains.

 

Clockwise from top left: moderator Philippe-André Rodriguez (Global Affairs Canada, McMaster University) discusses explainability in AI models with panelists Finale Doshi-Velez (Harvard University) and Boris Babic (University of Toronto) at Absolutely Interdisciplinary 2022.

 

What is explainable AI?

To understand how to legislate for explainability, we must first look more closely at what “explainable AI” is. As Doshi-Velez demonstrated, AI models fall into two main categories: “interpretable” models (or “white box”), in which the entire model’s decision-making structure is visible, and “partial view” (or “black box”) models, which utilize deep learning techniques with hidden layers that are too complex to be presented as a whole. With an interpretable model, we can know what processes were followed with regards to safety, accuracy, optimization, and transparency if we have precise definitions. But with a partial view model, we might not be able to obtain sufficient answers to these questions.

Finale Doshi-Velez

Why are partial view models used, given these associated challenges? Doshi-Velez observed that there are three main reasons. First, in certain domains, models with greater complexity will have greater accuracy. While a medical diagnosis might be obtained based on the presence of certain discrete criteria, processing an image or audio input often relies on complex relations between data that are not as easily summarized. Second, in complex systems where many models interact, it becomes increasingly challenging to roll things back to see the full picture. An algorithm used for pre-trial risk assessments that takes in 100 variables and outputs or recommendations is easier to disentangle than a model that relies on multiple global blocks working together within a social media ecosystem. Finally, for convenience’s sake, it is easier for developers to train a black box model that is not inherently interpretable and provide a partial view of the system afterwards.

AI developers are getting better at creating inherently interpretable models, and in these cases, it is almost always clear which processes were followed. However, in partial view models it is more difficult to know, even if we do not have a process we are trying to verify. Partial view models have an extremely limited observable area: the model is immense, and only a small snapshot of it is taken. If the snapshot is taken at a certain place, the model might appear to be flat—however, its greater structure might be a concatenation of flat parts that in fact make up a mountain. This means it is hard to guarantee what processes were followed when using partial views, especially in adversarial settings.

The rationale given to support explainable AI systems is that this feature will increase understandability, trust, and transparency, as well as reducing bias and discrimination. Explainable AI is also associated with support for democratic processes, aligns with our society’s conception of procedural justice, and supports substantive justice. But it remains to be seen how this is done.

Explainability vs. interpretability

Suppose we have a black box model that uses data to estimate a function that makes predictions based on an input. Using an interpretability paradigm, we would replace this black box model with a white box model, which would be fed the data and estimate a different function to make the same predictions, meaning we never use the black box model again. Using an explainability paradigm, we would identify a white box model that is able to track the predictions of the black box model as closely as possible, and use the latter model to explain the former one—they work in tandem, with predictions being made by the black box, and explanations of the predictions by the white box.

While it might be easier to just use inherently interpretable models, this would often limit us to using much simpler techniques. Additionally, there may be a trade-off between accuracy and interpretability for some scenarios. An example of an explainable model is LIME (Local Interpretable Model Agnostic Explanations), a technique that approximates black box models with local, interpretable models that explain every individual prediction.

Boris Babic

The case against explainable AI

To understand a model means to capture the “why” between a system input and its output. In his presentation, Babic contended explainable AI models are incapable of contributing to our understanding of black box systems, as they only provide post-hoc rationales for the predictions made by the system which are not unique. Babic used the example of a judge denying parole to a prisoner, where the reason given by the court clerk is that the decision was based on the fact that the prisoner was wearing an orange shirt, to illustrate how models based in an explainability paradigm provide after-the-fact rationales that are not much of an explanation, but rather just try to decipher a rationale to the particular data that was obtained.

The LIME algorithm follows a similar procedure. The black box model generates a certain function with a certain classification boundary. The LIME algorithm fits a linear model after the fact to the classification boundary that approximated it in some neighbourhood. Babic argues this is not very helpful from a legal, moral, or policy perspective because it is a post-hoc rationale. Furthermore, this model is also unstable since new data being inputted may change the classification boundary and the LIME algorithm would produce a varying observation. Explainability models themselves are also partial view, and are marketed as being able to improve our understanding of black box models. The implication of this claim is that they will open black box models to the user to help them understand what’s happening from the inside—however, that is not what they deliver, which undermines the benefit of explanations as a correct guide to subsequent behaviour.

What should we do instead?

When it comes to garnering trust, Doshi-Velez and Babic both agreed that too much value is currently being placed on mechanistic explanations. When we are prescribed medications by our doctors, we take it without needing to understand the biomedical pathways by which the drug works—Babic cited this example as an instance where users are completely trusting of the black box system. The reasons users of prescription drugs trust them is that there are other regulatory processes taking place that assure sufficient procedural attention, resulting in a use context in which we are not fixated on delivering an explanation of the system. This may be a better analog for garnering trust in the algorithmic context.

Watch the full session:


Rawan Abulibdeh

About the author

Rawan Abulibdeh is a PhD student at the University of Toronto in the Department of Electrical and Computer Engineering, and a 2021–22 Schwartz Reisman Graduate Fellow. Her research focuses on machine learning, deep learning, and algorithmic bias and fairness. She previously completed her Master’s in computer science from the University of Guelph, where she worked on AI and security, and was an intern at Al Jazeera Media Network as part of their data analytics team.


Browse stories by tag:

Related Posts

 
Previous
Previous

Redrawing data boundaries: From private collection to public good

Next
Next

Evolutionary biology offers new perspectives on designing AI