In the US, ~24 million persons live with COPD, half undiagnosed, and ~150,000 die of COPD
annually. COPD causes over 700,000 US hospitalizations and costs nearly $50 billion per year. The
human and financial burdens of COPD could likely be reduced if disease progression and other
adverse events could be anticipated, enabling caregivers to focus finite resources on at-risk patients.
We propose to create a decision-support tool that integrates biomedical informatics with advanced
machine learning (ML) and deep learning (DL) algorithms to predict acute and chronic healthcare
encounters (hospital admissions, readmissions, and ED encounters) and major disease progression
events (home oxygen therapy) for outpatients with COPD. Such a tool would confer immediate clinical
benefits and accelerate research on COPD disease progression and treatment. Predictive modeling is
widely used to identify high-risk patients for care management in COPD and other disorders, with a
strong emphasis on readmission risk. However, extant techniques are not sufficiently accurate and do
not identify the specific nature of likely future medical events, estimate time-to-event, and specifically
forecast medical encounters and disease progression events for individuals with COPD. Recent
research in disease progression modeling support the application of DL and other ML methods to
electronic health records (EHRs) to predict aspects of health history. EHRs contain both readily
accessible structured data (e.g., lab results in well-defined fields) and unstructured texts such as
physician’s notes. Unstructured texts contain a great deal of clinical information, but this information is
laborious to access; impeding its routine use in research and the clinic. This has motivated attempts to
use natural language processing (NLP) methods to automate annotation. We will apply NLP to identify
symptoms, treatments, procedures, diagnoses, social risk factors, and functional status from clinical
notes, expanding the data available from EHRs far beyond the usual coded variables. Also, and
distinctively, we will carry out a stepped-wedge clinical implementation of the proposed predictive tool
and evaluate its performance, a first for ML and DL prediction of COPD health events. Therefore, we
propose four Specific Aims: AIM 1: Transform EHR data streams to provision patient-level feature sets
for ML and DL consumption. AIM 2: Develop a set of ML and DL models to predict the time-to-event
for home oxygen therapy initiation and healthcare encounters among patients with COPD. AIM 3: To
develop and implement a prospective performance surveillance and calibration maintenance system to
maintain the final Aim 2 model for each outcome. AIM 4: Evaluate adoption and usability of the
DeepCOPD toolkit in near-realtime clinical use in two healthcare systems. The application is
responsive to the NHLBI IDEA2Health (NOT-HL-19-712).