PROJECT SUMMARY/ABSTRACT
Monogenic variants were once thought to be fully penetrant (all carriers will have disease) and have high
expressivity (carriers will have a severe phenotype). However, increased accessibility to sequencing has
shown that many monogenic variant carriers are asymptomatic or have milder than expected phenotypes. The
factors that affect incomplete penetrance and variable expressivity are unknown. Our preliminary data has
shown that common genetic background variants and the specific monogenic variant carried affects
penetrance and expressivity within the 200,000 exome release in UK Biobank. Our results show that carrier
polygenic risk scores are predictive of carrier phenotype, and that common genetic variants may be interacting
with monogenic genes to affect phenotype. We also show in our preliminary results that protein language
scores are able to differentiate variants of uncertain significance into loss-of-function (LOF), gain-of-function
(GOF), and benign categories. We propose to study how common genetic background and the specific
monogenic variant carried affects penetrance and expressivity in a more diverse patient population by
collaborating with biobanks across the nation, including the Colorado Center of Personalized Medicine at the
University of Colorado under guidance of Dr. Chris Gignoux and the BioMe biobank at Mt. Sinai under the
guidance of Dr. Eimear Kenny. In Aim 1, we will understand how common genetic background affects
penetrance and expressivity. We will apply polygenic risk scores to predict the phenotype of carriers to
understand how common genetic variants affect phenotypes, outside of the causal monogenic variant itself
(Aim 1.1). We will also run RHE-mc to detect if gene-by-gene interactions between common variants and the
monogenic gene affect phenotype (Aim 1.2). Because genetics research has been primarily focused on
studying patients of European ancestry, we aim to make our results accessible to patients of all genetic
ancestries by comparing our results of Aim 1.1 and 1.2 across all patients in these diverse biobanks (Aim 1.3).
We will also study how differing monogenic variants have differing penetrance and expressivity by applying
ESM1b protein language scores in Aim 2. We will apply these ESM1b scores to first classify missense variants
of unknown significance in carriers within these biobanks as LOF, GOF, and benign in Aim 2.1. Further, we will
also prioritize which monogenic missense variants have the highest impact on phenotype for carriers of
multiple missense variants in these biobanks (Aim 2.2). Our findings will not only provide more understanding
behind the factors that influence penetrance and expressivity, but also have potential to be applied
translationally to identify which monogenic carriers will need more aggressive treatment for their phenotype.