Integrating 4DN data to identify functional rare variants in UDN and GTEx - ABSTRACT Rare variants are abundant in the human population and contribute to a variety of genetic diseases. However, identifying impactful rare genetic variants from the multitude of inconsequential variants remains a challenge. Transcriptomics has emerged as a complementary assay to identify the effects of rare non-coding genetic variants as approaches using gene expression outliers have assisted prioritization of rare variants involved in rare diseases. We have previously developed a machine learning model, Watershed, to prioritize candidate causal rare variants by integrating outlier gene expression levels with relevant genomic annotations such as conservation and regulatory element annotations. This approach expands on variant effect prediction tools that are genome-only by further considering an individual’s transcriptome alongside their genome in the variant prioritization strategy. While previous approaches have focused on rare variant identification in regions proximal to the expression outlier gene, typically within 10kb, gene expression can be regulated in part by 3D genomic conformation. Given increasing evidence that rare variants impact DNA architecture and 3D nuclear organization can influence the pathogenesis of rare disease, we hypothesize integrating genome topology will enable the identification of rare variants with functional activity that are more distal to the expression outlier gene. Our study aims to leverage Common Fund datasets to extend Watershed by utilizing 3D nucleome information from The 4D Nucleome (4DN) Program to prioritize causal rare variants in transcriptome data from The Undiagnosed Disease Network (UDN) and The Genotype-Tissue Expression (GTEx) Project. Our proposed project will integrate cell-type specific and expression-matched annotations from dilution Hi-C experiments in 4DN with RNA-Seq data from GTEx and UDN. We will assess rare variant enrichment in highly- connected loci near gene expression outliers and develop Watershed-3D to utilize 4DN annotations to prioritize impactful rare variants in healthy individuals and those with rare diseases. We will evaluate its performance using N2 pairs, an approach we have previously applied to both GTEx and UDN, where two or more individuals share the same rare variant and the predicted outlier score for one individual may be evaluated based on the observed outlier status of the other individual(s) sharing the variant. Our outcome will be whether 3D annotations improve rare variant detection compared with previous models that do not include genome topology annotations and prioritize impactful, long-range rare regulatory variants in UDN samples. We will provide the genomics community with annotations for rare variants in GTEx and UDN and an expanded Watershed-3D model in a reproducible, cloud-ready workflow. Combined, this proposal represents a significant opportunity to integrate our emerging understanding of nuclear topology and organization with genetic information to more systematically prioritize functional rare variant effects and their roles in health and disease.