Project Summary
Explainable AI (XAI) techniques are revolutionizing scientific discovery and clinical application by helping
biomedical researchers interpret complex, black-box machine learning (ML) models. Given input features (e.g.,
dermatological image pixels) and an ML model-generated prediction (e.g., a diagnosis), the most widely used
form of XAI computes feature attributions, which represent the importance of each feature to the prediction, that
drive predictions even in complex models, such as deep neural networks. Biomedical research has successfully
applied deep learning using medical images (e.g., chest X-ray images) as input features; feature attributions
identify parts of the images that are important for the model, such as the existence of genuine pathologies (e.g.,
clear lung fields) or artifacts (e.g., medical devices and laterality marks).
However, key limitations of current XAI techniques, which provide only attributions for pre-specified input
features (here, individual pixels), preclude a clear understanding of the reasoning process of ML models and
limit actionable responses by medical providers. First, even the most interpretable model types, such as linear
models, can defy understanding if they use uninterpretable features. Second, computing theoretically principled
feature attributions involves exponential computational complexity; this challenge is exacerbated when using
modern deep models, such as transformers. Finally, the adoption of XAI techniques by multiple stakeholders,
such as regulators, developers, scientists, and physicians, requires real-world demonstration of their utility.
In this proposal, we introduce the following techniques and principles to fundamentally advance XAI.
Aim 1. Generate medically informed explanations. To bridge the gap between pixel-level features and medically
meaningful concepts, we propose a self-supervised approach to retrieve semantically meaningful concepts from
medical images and edit the images to systematically edit concepts of interest in images and examine the model
output changes. We will also develop a new attribution method to make self-supervised learning interpretable.
Aim 2. Develop XAI principles and techniques to compute and evaluate feature attributions. We propose
theoretically grounded techniques to: rigorously compute SHAP values for transformers; handle multimodality and
feature correlations; and evaluate feature attribution methods to help investigators discern the most effective
techniques for their applications.
Aim 3. Enable real-world application of XAI techniques to benefit multiple stakeholders. Using the improved
model explanations, we will develop an actionable XAI framework to audit third party AI devices, improve
clinical AI devices, and derive scientific insights.
Successful completion of this project will yield new theoretically grounded and principled XAI techniques to provide
medically informed explanations, compute accurate feature attributions, and detect and resolve model pitfalls.