PROJECT SUMMARY/ABSTRACT
Millions of cis-regulatory elements (CRE) have been identified in mammalian genomes, which harbor
large portion of GWAS variants associated with complex human diseases and traits. Interpreting the regulatory
target genes of CRE and GWAS variants remains challenging, as majority of genes are not merely regulated
by CREs in close one-dimensional (1D) vicinity. Instead, CREs can form DNA loops and regulate the
expression of gene(s) from hundreds of kilobases (Kb) away. Thus, deep understanding of chromatin spatial
organization can shed light on gene regulation and disease mechanisms. During the last decade, chromatin
conformation capture (3C)-derived technologies (e.g., in situ Hi-C, capture Hi-C, ChIA-PET, PLAC-seq and
HiChIP) have been widely used to provide a genome-wide view of chromatin spatial organization. However,
these technologies are usually applied to bulk tissue or purified cell lines, and cannot reveal cell-type-specific
chromatin interactome within complex tissues. Fortunately, harnessing the power of single cell technologies,
single cell Hi-C (scHi-C) and scHi-C-derived multi-modal assays, including single cell Methyl-HiC and single-
nucleus methyl-3C, have been rapidly advanced to study chromatin interactome at single cell resolution,
providing powerful tools to study chromatin spatial organization in complex tissues and disease relevant cell
types. While great strides have been made in scHi-C experimental technologies, computational methods for
analyzing scHi-C data are largely lagging behind. The methodological gaps fall mainly in three aspects: (1)
Current methods are inefficient to enhance resolution from extremely sparse scHi-C data. (2) Few methods
exist for removing systematic biases of scHi-C data within each cell, and adjusting for batch effect across
different cells. (3) No method is available to detect Kb resolution cell-type-specific chromatin interactions from
scHi-C data. To fill in these gaps, I propose major research directions: (1) develop deep learning-based
methods to impute sparse chromatin contacts in each cell, (2) develop non-parametric regression models to
remove systematic biases within each cell, and to adjust batch effects across different cells, (3) develop a
hybrid approach based on both global and local background models to identify cell-type-specific chromatin
interactions, and predict putative target genes of GWAS variants associated with complex human diseases and
traits, and (4) develop stand-alone, user-friendly software packages to analyze single cell chromatin
interactomic data and disseminate results. Completion of the proposed study will provide robust and user
friendly computational methods that allow us to analyze 3D genome organization at single cell resolution and
interpret their regulatory role on gene expression and complex human diseases.