PROJECT ABSTRACT
The primary goal of this project is to leverage large, harmonized data resources comprised of a broad range
of patients with heart failure (HF) by using machine learning (ML) to develop and test complex models to predict
clinical outcomes and identify HF phenotypes that may be clinically important based on pathophysiology,
prognosis, and treatment response. We will accomplish this through secondary analysis of 25 clinical trials, 6
large epidemiologic studies, and electronic health record data totaling ~ 130,000 patients with HF. Of these >
40,270 are derived from 21 BioLINCC datasets, 43,536 from industry-sponsored studies and 45,763 from the
EHR. By utilizing a variety of studies with respect to population, design, timeframe, and data source, we envisage
that our phenotypes will be a) more reflective of the spectrum of patients encountered in real world clinical
practice and b) able to be identified more consistently with routinely collected clinical data. Improved
characterization of outcomes according to HF phenotype may in turn facilitate personalization of HF
management both in terms of therapies and treatment goals. We hypothesize that predictive and phenotyping
models generated using these resources will outperform existing models across a range of data sources and
clinical populations. The primary overlapping Aims of this proposal are:
1. Use data from 74,308 patients in 25 completed clinical trials to characterize survival and
treatment response according to simple characteristics, predictive models, and complex
phenotypes. We apply both supervised and unsupervised ML methods to this dataset in one of the
largest individual patient data meta-analyses of HF clinical trial data to date. We will then compare the
predictive value of these models to established models derived using conventional regression and
survival analysis.
2. Validate models from Aim 1, explore novel phenotypes, and describe associated clinical
characteristics prior to HF diagnosis in 9,734 patients with incident HF from observational cohorts.
Using data from 6 large studies such as the Framingham Heart Study, we will validate established models
and models from Aim 1. We will also identify major phenotypes not well represented in clinical trials and
attempt to identify clinical risk factors that precede development of specific HF phenotypes.
3. Validate phenotype characteristics, associations, and outcomes in 45,763 patients with HF using
retrospective electronic health record (EHR) data from the University of Colorado's clinical data
warehouse. We will test all predictive and patient phenotype models derived in Aims 1 and 2 using these
harmonized real-world data and again identify phenotypes not well-represented in other the datasets.
Because of known health disparities in clinical practice, we will describe care patterns according to patient
phenotype that may impact outcomes.