[Video] AI Explained: What are Integrated Gradients?

Published

April 24, 2020

Last Edited

April 18, 2024

Ankur Taly

Fiddler AI

We started a video series with quick, short snippets of information on Explainability and Explainable AI. The second in this series is on Integrated Gradients - more about this method and its applications. Learn more in the ~10min video below. The first in the series is one on Shapley values - watch that here.

What are Integrated Gradients?

In this video, we discuss another attribution method called Integrated Gradients that can be used to explain predictions made by deep neural networks (or any differentiable model for that matter). It can be implemented in a few lines of code, and is much faster than Shapley values. The method serves as a popular tool for explaining image classification models in healthcare.

AI Explained: What are Integrated Gradients?

Integrated Gradients attribution method in Explainable AI

In the video we discuss a few things:

A simpler attribution method based on examining gradients at the input. This is one of the first attribution methods proposed for differentiable models, and dates back to at least 2010.
How this applies to deep neural networks, and why we often find bizarre looking attributions. For instance, for an image model, we find that pixels that seem irrelevant get highlighted. Now why does that happen?
How more relevant attributions can be obtained by examining gradients across multiple counterfactual inputs that interpolate between the input at hand and a certain baseline. This motivates the design of Integrated Gradients.
What are baselines? - the baseline is meant to be an information-less input, essentially an all zero input. For an image it could be the all-black image
The justification behind the Integrated Gradients method.

We conclude with an important caveat: unlike Shapley values, which place no restriction on the model function and only require blackbox access, vanilla Integrated Gradients requires differentiability and access to gradients. Consequently, the method cannot directly be applied to non-differentiable tree ensemble models (e.g., random forests, boosted trees). Interested readers can check out recent work on generalizing Integrated Gradients to non-differentiable models.