Project Summary / Abstract
Sepsis, Septic Shock, Acute Kidney Injury (AKI), acute respiratory distress syndrome (ARDS) and
respiratory failure are among the top causes of hospital mortality, morbidity, and an increase in duration
and cost of hospitalization. Successful prevention and management of these conditions rely on the ability
of clinicians to estimate the risk, and ideally, to anticipate and prevent these events. Acute care settings
and in particular intensive care units (ICUs) provide an environment where an immense amount of data
is acquired, and it is expected that with the advent of wearables and biometric patches even more data
will be available in such settings. But at present, very little of these data are used effectively to
prognosticate, and the existing predictive analytics risk scores suffer from lack of generalizability across
institutions and performance degradation within the same institution over time.
The PIs on this proposal recently demonstrated that a Deep Learning-based algorithm can reliably
predict new sepsis cases in the emergency departments, general hospital wards, and ICUs by as much
as 4-6 hours in advance and an area under the curve (ROC) of 0.85-0.90. Furthermore, through a 2-year
pilot study funded via Biomedical Advanced Research and Development Authority (BARDA), we recently
joined forces in a multicenter academic consortium to retrospectively validate this algorithm at each site.
Our collaboration has resulted in a multi-center longitudinal EHR dataset of critically ill patients and has
generated several important questions and findings related to design of portable and generalizable
predictive analytics algorithms that are robust to problems arising from gaps, errors, and biases in
electronic health records (EHRs) due to workflow-related factors (e.g. staffing-level), and heterogeneity
of patient populations and measurement devices.
We propose to significantly expand our prior work by designing new deep learning architectures that are
robust to data missingness and biases introduced through the variability in process of care, 2)
development of new learning methodologies to improve generalizability of the proposed models under
data/population drifts (aka distributional changes), 3) enhanced metadata design to assist in quantifying
`conditions for use' of such algorithms via algorithmic controls, and 4) HL7 and FHIR-based prospective
implementation and testing of these methodologies to provide real-world clinical evidence for the
effectiveness of the proposed approaches. Ultimately, these novel methodologies and tools will enhance
our ability to use EHR and other types of continuously measured longitudinal data to predict adverse
events, assess patients' response to therapy, and optimize and personalize care at the beside.