Decoding the Noncoding Regulatory Genome with Super-resolution via Single-cell Multiomics Integration - PROJECT SUMMARY In eukaryotes, transcriptional regulation is essential to maintaining cell identity, responding to intra- and extra-cellular signals, and coordinating gene activities, whereas its dysregulation can cause a broad range of disorders. Previous methods mainly used averaged genomic signals from thousands of cells to investigate gene regulation, failing to reveal the regulatory heterogeneity across diverse cell states. Recent advances in multimodal single-cell technologies provide new opportunities to decipher the cell-type-specific regulation code at the finest resolution possible. However, its computational modeling is still in its infancy due to the high dimensionality, missingness, vulnerabilities to confounding factors, and complex feature interactions. In this project, we aim to develop a suite of computational models to construct gene-centric, personal regulome via single-cell multiome integration and link multi-scale dysregulations to disease. Distinct from previous efforts reporting a set of one-dimensional (1D) functional cis-regulatory elements (CREs) from only one genome and applying it to all samples, we aim to construct personal, compact, gene-centric, and cell-type-specific transcriptional regulome from sc- multiome data. Specifically, we will first propose a scalable multimodal deep generative model to integrate single-cell data with single-, multi-, and hybrid modalities. Distinct to existing methods, we will include an invariant representation learning scheme to derive latent cell representations uncorrelated with confounding factors (e.g., age, gender, read depth, and batch effects) for bias- free transcriptome and epigenome reconstruction (Aim 1). Then, we will go beyond the 1D genome annotation by deciphering multi-scale gene regulation code (Aim 2), including i) functional CREs at a base-pair resolution; ii) CRE target genes for functional interpretation; iii) transcription factor regulatory networks. Lastly, we will develop interpretable deep learning models to link multi-scale dysregulations to disease with mechanistic explanation (Aim 3). This proposal is built on a close and long-term collaboration between Dr. Jing Zhang, an expert in computational biology and machine learning at the University of California, Irvine, with Dr. Feng Yue, an expert in regulatory genomics and 3D genome organization at the Northwestern University. Upon completion, our proposed methods will substantially deepen our understanding of transcriptional regulation to a single-cell level resolution and quantitatively relate multi-scale risk factors to genetic disorders. In addition, our aims will yield open-source software for the scientific community as essential tools for single-cell multi-omics data processing and integration.