Decoding disease-critical genomic architecture using multimodal single-cell omics data - Project Summary/Abstract Understanding the functional architecture of human diseases and traits at a cellular resolution is critical for informing follow-up functional characterization experiments and nominating genes and pathways for developing drug targets. Large scale omics data encompassing multiple modalities (RNA-seq, ATAC-seq, ChiP-seq), a broad range of tissues and cell types, and diverse biological contexts, such as disease stages, developmental trajectories, and gene perturbations, offer significant new resources to gain a deeper understanding of the genetic architecture of complex diseases. In this proposal, we plan to develop statistical and machine learning approaches that bridge the gaps between human genetics and single-cell omics data to decode the regulatory activity underlying disease variants, link variants to genes accurately in relevant cell types of action, and identify disease-critical co-operative programs of genes and genomic elements activated in specific biological contexts . The proposed aims are targeted at uncovering new insights into complex disease etiology by advancing our understanding of gene regulation and exploring the synergies and contrasts in epigenomic activity and downstream cellular processes or biological pathways. A key goal of this application is to produce a set of computational tools and workflows that can identify and rank functionally disease-critical variants, genes, and pathways, along with a detailed understanding of their putative cell type and biological context of action. This can greatly inform downstream disease-focused intervention strategies like drug perturbation, and single-guide or combinatorial CRISPR screening experiments. In the first aim of this proposal, we will leverage single-cell RNA+ATAC multiome data to learn improved strategies of linking enhancers to genes in a cell type, explore the co-operative effects of multiple enhancers on gene regulation, and identify sets of enhancers and linked genes that together constitute distinct disease-critical functional units. In the second aim, we will use quantitative trait loci (xQTL) data spanning a broad range of molecular phenotypes to map the cis and trans-regulatory architecture of GWAS variants, and better pinpoint causal variants for these phenotypes through integration with base-pair resolution variant function assays and models, such as sequence-based deep learning models. In the third aim, we will leverage multimodal omics data observed across multiple biological contexts to identify programs of genes and elements activated under specific contexts and assess their impact on complex disease GWAS signals. We will also demonstrate how disease-related benchmarking of gene programs activated upon enhancer and gene perturbations can inform a cost-efficient experimental plan of a downstream perturbation experiment with multi-omics readouts. All variant- level functional annotations, variant-gene links, and gene programs, together with a quantitative and qualitative assessment of their impact on human diseases, and all relevant computational software and pipelines will be shared publicly with the scientific community.