Extending the utility and performance of variant effect predictors with protein language models - Project Summary Variant effect prediction (VEP); the process of determining the impact of amino acid alterations in a protein sequence, remains a fundamental challenge across both clinical and research domains. Despite the extensive application of existing VEP methods, their overall impact is limited, with most variants labeled “variants of unknown significance”. This research project aims to overcome these limitations in VEP, harnessing the potential of protein language models (PLMs) which have already shown widespread success in other fields, and integrating complementary sources of information, as employed by current methodologies, to enhance the understanding and prediction of genetic variants' functional impact on proteins and complex traits. The specific aims include: 1) Enhancing the core functionality of VEP models by providing robust estimates of score uncertainty and experimentally validating whole haplotype effect scores, including predictions of epistatic interactions. 2) Improving VEP model performance by integrating PLMs with external information such as 3D structural and homology data and fine-tuning them on functional assays and clinical databases. 3) Improving the discovery and clinical interpretation of functional protein-altering variants by optimally utilizing computational annotations and analyzing whole haplotype data in the context of gene-trait associations and clinical settings. This research project builds upon our strong preliminary data of PLM-based variant effect prediction, which by multiple metrics has demonstrated best-in-class performance. By leveraging PLMs and a variety of external data, this project aspires to advance the field of variant effect prediction, enabling a more profound understanding of genetic alterations, and improving diagnostic and prognostic medical exome sequencing.