PROJECT SUMMARY
Congenital Heart Disease (CHD) is the most common birth defect, yet the genetics of this disease are poorly
understood. The genomic mechanisms of this disease include distinct rare copy number variants (CNVs) and protein-
coding single nucleotide variants (SNVs). CHDs without other congenital anomalies, or isolated CHD, comprise 75%
of all CHDs. Genome sequencing (GS) studies of isolated CHD have focused primarily on protein-coding regions,
identifying disease-causal variants in only ~10-20% of subjects. This substantial knowledge gap suggests that other
etiologies, such as variation in the non-coding genome, may play a role. The non-coding genome is vast, constituting
98% of the genome, and encompasses multiple feature types, including the non-coding RNAs. There is growing
evidence for the role of long non-coding RNAs (lncRNAs) in disease, including developmental disorders of the heart.
As such, the long-term goal of this study is to elucidate lncRNA’s role in contributing to cardiac malformations. The
overarching objective of the proposed investigation is to develop computational methods to predict the function of
lncRNAs involved in heart development and predict the pathogenic impact of variants impacting these molecules
leading to heart maldevelopment. We will use GS data from the Gabriella Miller Kids First (GMKF) cohort to associate
variation in lncRNAs to CHD. We will then use single-cell RNA-sequencing (scRNA-Seq) data to identify lncRNAs
expressed in relevant cell types during crucial stages of human cardiogenesis. Our central hypothesis is that variants
in lncRNAs are a probable cause in unsolved CHD cases and that by using scRNA-seq data, we can prioritize
candidates for future functional validation. We propose the following specific aims to address this challenge. In Aim
1, we will develop a machine learning (ML) tool to annotate lncRNA variants in our CHD cohort. There is a lack of
tools to interpret the biological implications of CNVs and SNVs impacting lncRNAs. Our preliminary data effectively
annotated clinically validated CNVs associated with isolated CHD by applying ML. We will extend our methods to
consider CNVs and SNVs impacting lncRNAs and those impacting protein-coding genes. Aim 2 will apply network
analysis on scRNA-Seq data to elucidate lncRNA’s role in heart development. We will associate lncRNA-protein causal
relationships with general heart development by using inference from the gene regulatory networks (GRN). GRN
will be built from single-cell transcriptomics data to contribute to the discovery of lncRNAs involved in heart
development. This work is innovative as we will be the first to construct an ML tool for cardiac-specific lncRNA
variant annotation and clarify the role that lncRNAs may play in the development of CHD. Completing this project
will achieve the NHLBI’s mission of creating computational techniques for understanding the mechanisms
underlying the regulation of normal heart formation and NICHD’s objective of comprehending the genetic basis of
heart defects. In addition, the research is significant since it may lead to the discovery of novel genetic etiologies in
CHD and the identification of novel therapeutic targets.