PROJECT SUMMARY
The application of machine learning (ML) to randomized clinical trials (RCTs) represents a novel avenue for
developing tools to enhance precision cardiovascular care. ML-based predictive approaches can learn response
profiles based on the clinical characteristics of patients included in RCTs, thereby allowing personalized
inference. However, despite the development of promising algorithms from applying these novel methods to
high-quality experimental data in RCTs, there is a lack of a clear pathway for their real-world evaluation and
implementation. To bridge this gap, we aim to develop an implementation-aligned strategy for care
personalization models. We achieve this through our three study aims. In Aim 1, we will empirically evaluate
various ML approaches for RCT using adequately powered simulated heterogeneous treatment effects,
specifically incorporating covariate distributions observed in real-world patients from two distinct and diverse
health systems. Using participant-level data from five diverse NIH-funded RCT datasets, we will evaluate models
based on their performance in detecting simulated graded positive control heterogeneous treatments effects in
these RCTs as well as in “digital twins” of these RCTs, computationally designed to replicate populations of these
conditions in the practice in electronic health records (EHRs). Such an approach is needed to evaluate model
generalizability to different populations expected in EHRs. In Aim 2, we enhance the interoperability between
RCTs and EHRs, which is required for translating RCT-derived models to EHRs as well as selecting candidate
predictors based on their EHR availability. We will accomplish this by mapping covariates from RCTs to a
common data model, using a novel sentence transformer to map the descriptions of these covariates to those in
the common data model. We will demonstrate real-world RCT covariate distribution at 13 hospitals across two
health system EHRs mapped to the same common data model. In Aim 3, we will address the informative
missingness of covariates in the real-world data, representing another key challenge limiting the pragmatic
evaluation of algorithms developed from RCTs. For this, we will prospectively evaluate novel approaches that
adapt models for variable missingness, both random and informative, during the model development process. In
this study, we will assess whether “missingness-adapted algorithms” accurately capture the personalized effect
estimates for patients, compared with a complete-covariate algorithm whose covariates are captured
prospectively through direct patient contact. Collectively, the proposal will develop an end-to-end strategy for
evaluating models developed from RCTs to improve their selection for real-world, pragmatic evaluation and
implementation in EHRs. The methods will be rigorously tested in multiple RCTs. Moreover, through open-source
data sharing, the datasets and the results of our work will be available as benchmarks for the rigorous
development of further methodology for detecting personalized effects from RCTs. The proposal will serve as an
essential framework for evaluating and translating precision care tools developed from RCTs.