PROJECT SUMMARY:
Congenital heart disease is the most common congenital anomaly and affects approximately 1% of infants.
Hypoplastic left heart syndrome (HLHS), a severe form of congenital heart disease in which the left ventricle is
underdeveloped, has a 10-year mortality of 40%. Only 6% of HLHS patients have a genetic cause identified on
exome sequencing, limiting the ability of patients to receive a diagnosis and potentially benefit from targeted
treatments. There are two theoretical mechanisms for HLHS: a cardiomyocyte origin, where there a defect in
cardiac muscle cells causes underdevelopment of the ventricle, or an endothelial origin, where value
abnormalities attenuate flow through the left ventricle. Two known HLHS genes, RBFOX2 and NOTCH1, are
primarily expressed in cardiomyocytes and cardiac endothelial cells, respectively and provide an opportunity to
study these mechanisms. Discovery of additional pathogenic HLHS variants could increase the proportion of
diagnosed patients and improve our molecular understanding of cardiac development. Currently, most
pathogenic variants in exome sequencing are loss-of-function variants that reduce gene expression. To test my
hypothesis that missense and noncoding variants also contribute to HLHS by altering gene expression or
activity, I propose to use machine learning on HLHS patient genome sequencing, three-dimensional protein
structure, and enhancer assay data to identify new genetic contributors to HLHS. By completing these aims, I
will advance my training in functional assays and machine learning to be best prepared for a career as an
independent physician scientist.
My scientific goal is to identify new variants and loci that contribute to HLHS. First, in Aim 1 I will use machine
learning to predict the pathogenicity of missense variants in RBFOX2 from HLHS patients. Accuracy of these
predictions will be determined by genome editing of induced pluripotent stem cells to introduce the RBFOX2
missense variants, followed by assessment of RBFOX2 expression and function during cardiomyocyte
differentiation. In Aim 2, NOTCH1 missense variants will be similarly assessed for pathogenicity during cardiac
endothelial cell differentiation. Finally, in Aim 3 I will use massively parallel reporter assays to identify active
cis-regulatory regions near RBFOX2- and NOTCH1-pathway genes, and then determine if rare variants in
HLHS patients within these regions cause gene dysregulation. I will use linear models and machine learning to
determine which cardiac genomic annotations that best predict enhancer activity, and use those annotations to
identify additional candidate HLHS loci. Together this proposal will employ machine learning on biological data
in a way that uses my background in developmental biology and develops new skills in computational and
functional genomics. These results will contribute towards the long-term objective of understanding the
molecular basis of heart development and human disease to improve diagnosis, better define risks, and inspire
novel treatments for patients.