Systematic integration of variant interpretation tools into genetic and genomic risk prediction - PROJECT SUMMARY Machine learning-based pathogenicity predictors, trained on known disease and non-disease-implicated variants, predict whether a variant is pathogenic or benign. They have become a key component of genetic variant discovery and clinical genetic testing. However, their use in individual-level genetic and genomic disease risk prediction has been limited by their incompatibility with typically reported measures of risk, and terminology associated with clinical decision-making. Variant pathogenicity predictors generally serve as a key interpretation tool for rare variants in the absence of large enough cohorts to achieve statistical power. As the clinical deployment of genetic and genomic risk prediction models becomes more widespread, it is essential that rare variants be readily incorporated into these models and the risk conferred by such variants be correctly accounted for. To address these gaps, the proposed study will develop methods for the systematic integration and calibration of variant pathogenicity predictors into genetic and genomic disease risk prediction and will test the hypothesis that these methods lead to more accurate and clinically interpretable predictions of disease risk. This work will be carried out through three specific aims: (1) we will adapt and calibrate existing predictors for gene-specific prediction of pathogenic variants, (2) we will develop variant pathogenicity predictor-based exomic disease risk scores, and (3) we will integrate pathogenicity predictors into genome-wide polygenic risk score (PRS) development. The principal investigator (PI) will bring deep expertise in variant pathogenicity predictor development and model calibration to this project, and build a team with complementary expertise in statistical genetics and polygenic risk score development to carry out the work. Additionally, the PI has formulated a plan for his scientific and professional development, and will assemble an external advisory committee with further complementarity of expertise (e.g., in population genetics and clinical genetics). This project will leverage multiple large genome-phenotype data sets that are available publicly (UK Biobank, dbGaP) and through the world-class infrastructure at the Icahn School of Medicine at Mount Sinai (BioMe). This work is expected to have positive impact on multiple fronts. First, there are currently no systematic integration and calibration frameworks for variant pathogenicity predictors that can be generalized for risk prediction across different types of variants, genes and/or diseases. Second, open-source software to calibrate the output of variant pathogenicity predictors in specific contexts (genes, diseases, among others) will be developed and shared with the broader community. Finally, computationally derived estimates of prevalence of pathogenic variants and risk models will be made available through Mount Sinai and NIH resources.