ABSTRACT
Chromosome conformation capture techniques, particularly Hi-C, have benefitted the study of the spatial
proximity, interaction, genome conformation of cells, and genome architecture leading to the development of
several three-dimensional (3D) chromosome structure modeling methods. Many observations become more
apparent in 3D because some relationships—for example, evolutionary constraints or cell-to-cell variability of
mammalian chromosome structures—cannot be surmised by genomic sequences alone. Although members of
the bioinformatics community, including the PI, have developed many algorithms for reconstructing 3D genome
structures based on population Hi-C data, we lack computationally effective methods to precisely model at a
high-resolution (<=5 kilobase (kb)). One difficulty is the exponentially increasing number of fragments at this
resolution. My work in the last five years provides the premise for the current proposal and uniquely positions
my interdisciplinary research program to carry out the proposed studies. The PI proposes to conduct leading
research to overcome this challenge and address important questions that remain about how (and why) 3D
genome structures across cells are organized and about the relationship between 3D structure and genetic and
epigenetic mechanisms for gene expression. During the next five years, the PI’s objective is to develop
computational and machine learning-based models to further highlight the hierarchical organization of, and the
refined structures within, the genome. The PI proposes to explore the development of innovative models for 3D
chromosome and genome reconstruction using a novel noninstance-based generalizable model based on a
graph convolutional neural network to generalize across resolutions, chromosomes, restriction enzymes, and
cell populations. Given the PI’s background, track record, and productivity in the genomic research field, the
computational objectives defined here are not only feasible but also computationally and biologically rewarding
to the bioinformatics community at large. Computationally, our methodology will resemble a robust one-size-fits-
all model that can be sufficiently trained at a lower computational cost on less complex data and be used across
multiple higher resolutions for 3D structural modeling. Biologically, our proposed reconstruction algorithms will
aid diseases diagnosis, prevention or treatment by shedding light on the relationship between long-range
interaction and gene expression in human cells and how disruptions in physical interactions between genes and
the enhancers or silencers could aberrantly alter gene expression. Thus, this research demonstrates the
potential impact of knowing the architecture of the genome to the understanding of biological processes and
human disease. Once the proposed objectives are completed, the PI will ultimately have been well established
as an independent investigator, and will have proposed leading robust, high-performing, and efficient
computational algorithms that will provide new vertical advancement in the chromatin genomics research field.