The confluence of new machine learning (ML) data-driven approaches; increased computational power; and
access to the wealth of electronic health records (EHRs) and other emergent types of data (e.g., omics, imaging,
mHealth) are accelerating the development of biomedical predictive models. Such models range from traditional
statistical approaches (e.g., regression) through to more advanced deep learning techniques (e.g., convolutional
neural networks, CNNs), and span different tasks (e.g., biomarker/pathway discovery, diagnostic, prognostic).
Two issues have become evident: 1) as there are no comprehensive standards to support the dissemination of
these models, scientific reproducibility is problematic, given challenges in interpretation and implementation; and
2) as new models are put forth, methods to assess differences in performance, as well as insights into external
validity (i.e., transportability), are necessary. Tools moving beyond the sharing of data and model “executables”
are needed, capturing the (meta)data necessary to fully reproduce a model and its evaluation.
The objective of this R01 is the development of an informatics standard supporting the requisite information for
scientific reproducibility for statistical and ML-based biomedical predictive models; from this foundation, we then
develop new computational approaches to compare models' performance. We begin by extending the current
Predictive Model Markup Language (PMML) standard to fully characterize biomedical datasets and harmonize
variable definitions; to elucidate the algorithms involved in model creation (e.g., data preprocessing, parameter
estimation); and to explain the validation methodology. Importantly, models in this PMML format will become
findable, accessible, interoperable, and reusable (i.e., following FAIR principles). We then propose novel meth-
ods to compare and contrast predictive models, assessing transportability across datasets. While metrics exist
for comparing models (e.g., c-statistics, calibration), often the required case-level information is not available to
calculate these measures. We thus introduce an approach to simulate cases based on a model's reported da-
taset statistics, enabling such calculations. Different levels of transportability are then assigned to the metrics,
determining the extent to which a selected model is applicable to a given population/cohort (i.e., helping answer
the question, can I use this published model with my own data?). We tie these efforts together in our proposed
framework, the PREdictive Model Index & Exchange REpository (PREMIERE). We will develop an online portal
and repository for model sharing around PREMIERE, and our efforts will include fostering a community of users
to guide its development through workshops, model-thons, and other activities. To demonstrate these efforts,
we will bootstrap PREMIERE with predictive models from a targeted domain (risk assessment in imaging-based
lung cancer screening). Our efforts to evaluate these developments will engage a range of stakeholders (model
developers, users) to inform the completeness of our standard; and biostatisticians and clinical experts to guide
assessment of model transportability.