PROJECT SUMMARY
Machine learning-based pathogenicity predictors, trained on known disease and non-disease-implicated
variants, predict whether a variant is pathogenic or benign. They have become a key component of genetic
variant discovery and clinical genetic testing. However, their use in individual-level genetic and genomic
disease risk prediction has been limited by their incompatibility with typically reported measures of risk, and
terminology associated with clinical decision-making. Variant pathogenicity predictors generally serve as a key
interpretation tool for rare variants in the absence of large enough cohorts to achieve statistical power. As the
clinical deployment of genetic and genomic risk prediction models becomes more widespread, it is essential
that rare variants be readily incorporated into these models and the risk conferred by such variants be correctly
accounted for. To address these gaps, the proposed study will develop methods for the systematic integration
and calibration of variant pathogenicity predictors into genetic and genomic disease risk prediction and will test
the hypothesis that these methods lead to more accurate and clinically interpretable predictions of disease risk.
This work will be carried out through three specific aims: (1) we will adapt and calibrate existing predictors for
gene-specific prediction of pathogenic variants, (2) we will develop variant pathogenicity predictor-based
exomic disease risk scores, and (3) we will integrate pathogenicity predictors into genome-wide polygenic risk
score (PRS) development. The principal investigator (PI) will bring deep expertise in variant pathogenicity
predictor development and model calibration to this project, and build a team with complementary expertise in
statistical genetics and polygenic risk score development to carry out the work. Additionally, the PI has
formulated a plan for his scientific and professional development, and will assemble an external advisory
committee with further complementarity of expertise (e.g., in population genetics and clinical genetics). This
project will leverage multiple large genome-phenotype data sets that are available publicly (UK Biobank,
dbGaP) and through the world-class infrastructure at the Icahn School of Medicine at Mount Sinai (BioMe).
This work is expected to have positive impact on multiple fronts. First, there are currently no systematic
integration and calibration frameworks for variant pathogenicity predictors that can be generalized for risk
prediction across different types of variants, genes and/or diseases. Second, open-source software to calibrate
the output of variant pathogenicity predictors in specific contexts (genes, diseases, among others) will be
developed and shared with the broader community. Finally, computationally derived estimates of prevalence of
pathogenic variants and risk models will be made available through Mount Sinai and NIH resources.