COVID-19 disease course analysis using multi-site large-scale EHR data - Project Summary/Abstract
Since its first case reported in December 2019, the coronavirus disease-2019 (COVID-19) has caused a pan-
demic in 188 countries/regions, and has precipitated an unprecedented health, economic and social crisis. In
order to cope with the volatile dynamic and severity of the pandemic, it is imperative that we characterize the
various clinical courses of COVID-19 infection, and determine whether and how demographic, clinical and other
variables influence them. Knowledge of the disease's transmission, symptomatology, clinical course, treatment
and outcomes is rapidly evolving based on many sources. An important source for advancing this knowledge
is data from electronic health records (EHR) and health information exchanges (HIE) because they can pro-
vide a real-time, unvarnished view of the disease. Using large-scale, well-integrated and rich EHR data enables
comprehensive profiling and quantification of the COVID-19 disease course that can directly inform clinical prac-
tice. The long-term goal of our research is to develop Artificial Intelligence (AI) tools to facilitate access to and
analysis of clinical data. The goal of this application is to develop effective algorithms and tools to mine clinical
data to categorize disease courses of COVID-19, and determine the effect of clinical and other variables asso-
ciated with them. We will develop our algorithms using data from a large and comprehensive health information
exchange, the Indiana Network for Patient Care (INPC), which has about 40,000 COVID-19 patients and fairly
complete EHR data about them. We will evaluate the algorithms against other data sets, including EHR data
from the OSU Wexner Medical Center and the National COVID Cohort Collaborative (N3C). The specific aims
of this project are to (1) develop COVID-19 disease course groupings, (2) relate comorbidities and other clinical
variables to the COVID-19 disease course, and (3) validate the developed algorithms on N3C data. This pro-
posal is significant because the methods developed in this project have the potential to significantly increase our
capability for computational analysis of large and rich patient data during the pandemic and beyond; the knowl-
edge derived from our comprehensive profiling of COVID-19 courses over large, inclusive patient populations
supported by rich EHR data can positively impact clinical practice; and the tools developed in this project will be
released to the public as a free COVID-19 research re- source. It is innovative because our methods integrate
novel methods such as patient clustering using clinical variables and disease progression trajectories, and pa-
tient trajectory comparison, with established univariate and predictive analysis; our primary approach will lever-
age the oldest and one of the country's largest HIEs to derive detailed and comprehensive knowledge about a
large patient population; and the strong preliminary data generated by this project can help improve COVID-19
patient phenotyping, disease characterization and diagnosis.