Project Summary
For the past decade, the population-cell Hi-C technique has significantly improved our ability to discover
genome-wide DNA proximities. However, because population Hi-C is based on a pool of cells, it will not help
us reveal each single cell's 3D genome structure or understand cell-to-cell variability in terms of 3D genome
structure and gene regulation. It is also difficult to achieve a high resolution, such as 1 Kbp, with population Hi-
C; therefore, when finding and analyzing the spatial interactions for the promoter or enhancer regions typically
associated with biologically-important regulatory elements, population Hi-C data's resolution is too low to be
useful. Moreover, while we know that the CTCF-cohesin complex plays a key role in the formation of genome
3D structures, the question is whether long non-coding RNAs (lncRNAs) are involved in the process since
lncRNAs have been found to recruit proteins needed for chromatin remodeling, and our preliminary research
has found that lncRNA LINC00346 directly interacts with CTCF. Finally, while members of the bioinformatics
community, including the PI, have developed many algorithms to reconstruct 3D genome structures based on
population Hi-C data, important questions still must be answered regarding how 3D genome structures are
involved in gene regulation and whether there are relationships between 3D genome structures and genetic
and epigenetic features. The PI proposes to conduct leading research to overcome these challenges and
address these questions. During the next five years, the PI will develop algorithms to reconstruct the 3D whole-
genome structures for single cells and analyze cell-to-cell variabilities in terms of 3D genome structure and
gene regulation. The PI will develop a deep learning algorithm to enhance the resolution of population Hi-C
data to that of Capture Hi-C data (1 Kbp) so that we can make good use of the large amount of Hi-C data
accumulated in the past decade. An online database will be built to allow the community to access both
population and single-cell 3D genome structures in an integrated way. The PI will work with a cancer biologist
to discover any lncRNAs that function as a scaffold to fine-tune the CTCF-cohesin protein complex, as well as
two neuron scientists to develop a more complete understanding of gene regulation while considering 3D
genome and other genetic and epigenetic features. Given the PI's track record and productivity, having three
computational goals and two collaborative goals is not only feasible but computationally and biologically
rewarding. In five years, once the proposed studies are accomplished, the PI should have established a
uniquely independent place in the field of 3D genome, maintaining leading positions in inferring single-cell 3D
genome structures, enhancing Hi-C data resolution, and building 3D genome databases, while establishing
similar positions in reconstructing high-resolution 3D genome structures, finding lncRNAs' roles in the
formation of genome structures, and understanding how 3D genome structures are involved in gene regulation.