Enhanced Metadata Design, Architecture, and Learning (MeDAL) for Development of Generalizable Deep Learning-based Predictive Analytics from Electronic Health Records - Project Summary / Abstract Sepsis, Septic Shock, Acute Kidney Injury (AKI), acute respiratory distress syndrome (ARDS) and respiratory failure are among the top causes of hospital mortality, morbidity, and an increase in duration and cost of hospitalization. Successful prevention and management of these conditions rely on the ability of clinicians to estimate the risk, and ideally, to anticipate and prevent these events. Acute care settings and in particular intensive care units (ICUs) provide an environment where an immense amount of data is acquired, and it is expected that with the advent of wearables and biometric patches even more data will be available in such settings. But at present, very little of these data are used effectively to prognosticate, and the existing predictive analytics risk scores suffer from lack of generalizability across institutions and performance degradation within the same institution over time. The PIs on this proposal recently demonstrated that a Deep Learning-based algorithm can reliably predict new sepsis cases in the emergency departments, general hospital wards, and ICUs by as much as 4-6 hours in advance and an area under the curve (ROC) of 0.85-0.90. Furthermore, through a 2-year pilot study funded via Biomedical Advanced Research and Development Authority (BARDA), we recently joined forces in a multicenter academic consortium to retrospectively validate this algorithm at each site. Our collaboration has resulted in a multi-center longitudinal EHR dataset of critically ill patients and has generated several important questions and findings related to design of portable and generalizable predictive analytics algorithms that are robust to problems arising from gaps, errors, and biases in electronic health records (EHRs) due to workflow-related factors (e.g. staffing-level), and heterogeneity of patient populations and measurement devices. We propose to significantly expand our prior work by designing new deep learning architectures that are robust to data missingness and biases introduced through the variability in process of care, 2) development of new learning methodologies to improve generalizability of the proposed models under data/population drifts (aka distributional changes), 3) enhanced metadata design to assist in quantifying `conditions for use' of such algorithms via algorithmic controls, and 4) HL7 and FHIR-based prospective implementation and testing of these methodologies to provide real-world clinical evidence for the effectiveness of the proposed approaches. Ultimately, these novel methodologies and tools will enhance our ability to use EHR and other types of continuously measured longitudinal data to predict adverse events, assess patients' response to therapy, and optimize and personalize care at the beside.