Project Summary/Abstract
This project develops artificial intelligence (AI)/machine learning (ML) methods that provide real-time
monitoring and updating of clinical decision support (CDS) systems in order to reduce downstream health
disparities. As AI/ML systems mature and their use in clinical decision support continues to grow, so does their
potential to impact health outcomes on a population-wide scale. A particular concern is the reliance of AI/ML
development on retrospective patient data, which often 1) under-characterizes historically marginalized groups
and 2) contains systematic treatment and outcomes disparities that a naïve model would be rewarded for
reproducing. Against this backdrop, a growing sub-field of AI/ML known as “fair ML” or “ethical AI” has
emerged, dedicated to embedding ethical notions of fairness into mathematical constraints used to mitigate
undesirable behavior by machines. Although promising, current approaches struggle confront the following
challenges to ML-based CDS: 1) degradation in performance with shifts to clinical practice and care dynamics;
2) satisfying a notion of fairness without sacrificing overall model performance; 3), an inability to adapt to, or
integrate with, existing clinical workflows. This project focuses on developing AI-based postprocessing
methods that fill these gaps. We focus methods development on resource- or time-constrained settings where
AI/ML tools are used to prioritize the order and the type of care patients receive. The primary application of
interest is an operationalized ML system that provides early predictions of inpatient admissions in the
emergency department (ED) to improve patient flow and expedite ED visits. In addition, we evaluate the
generalization of our findings to multiple clinical endpoints, across sites, and with respect to time. The central
hypothesis of this work is that fair ML methods can improve the equity of clinical decision support models by
monitoring and updating their performance over time. In Aim 1 and 2, we develop methods for training fair ML
models on retrospective data and assess their ability to influence ED attending physician decisions using a
silent prospective study design. In Aim 3, we extend our preliminary work post-processing methods for
satisfying multicalibration, which is a measure of equity central to risk prediction. We adapt existing methods to
the real-time setting so that they can learn and adapt to streams of health data. We measure the ability of
postprocessed ML models to perform fairly and accurately on these prediction tasks across models, tasks,
clinical sites, and time. Aim 4 focuses on implementing real-time AI auditing and calibrating system for
prediction patient disposition in the emergency department at Boston Children's Hospital, where a successful
CDS system is currently in place. This body work aligns with the National Library of Medicine’s vision of
“sustainable computational infrastructure” and has the goal of reducing health disparities that may arise from
CDS via its influence on care decisions.