PROJECT SUMMARY/ABSTRACT
Recent advances in next generation–sequencing (NGS)-based molecular methods have illuminated the
hierarchical organization of the genome and have shown that changes in the epigenome can promote or prevent
the access of transcription factors (TFs) to specific DNA sequences, move genes between nuclear
compartments, and build or remove the insulation between neighboring genomic regions. As changes in the
epigenome and chromatin organization can derail precise transcriptional regulatory programs to change cell
differentiation status or induce a pathological state, research in Dr. Li’s laboratory seeks to improve our ability to
define and understand the impact of such changes across multiple layers of transcriptional regulation in the cell.
The laboratory has effectively addressed the regulatory roles of DNA methylation in its previous and ongoing
work and now extends its focus to hydroxymethylation. 5-hydroxymethylcytosine (5hmC), is a key epigenetic
modification linked to transcriptional activation; however, 5hmC data and its genome properties have thus far
been evaluated with limited integration of different genomic data types. Moreover, there is no integrative
computational framework designed to interpret the functional role of 5hmC in the context of 5-methycytosine
(5mC), enhancer activities, chromatin interactions, gene expression data, and DNA sequence information. This
proposal will fill the growing need for user-friendly, interpretable, and extendable tools for mining 5hmC data
toward laying a foundation for basic mechanistic studies of the epigenome and facilitate discovery of potential
therapeutic targets in disease. Building on the investigator’s progress in revealing the dynamics of 5hmC and its
impact on gene regulation, the proposal will now develop innovative computational tools for 5hmC data mining
and data integration with other NGS datasets, with a focus on applying these tools to B cell differentiation, cancer,
and embryonic stem cell (ESC) differentiation. Key goals over the next five years include developing a
computational framework to mine short- and long-read sequencing data to answer the following questions: (1)
How does 5hmC contribute to epigenetic heterogeneity? (2) How does 5hmC epigenetic heterogeneity contribute
to transcriptome heterogeneity? (3) How do 5hmC levels and epigenetic heterogeneity communicate with histone
modifications, enhancer activities, chromatin interactions, and chromatin organization? We will combine machine
learning and network mining algorithms to enable knowledge discovery and data integration from diverse
genomic data types. We will then harness the 5hmC data-mining framework to identify 5hmC patterns that
correlate with ESC differentiation, B cell differentiation, and that contribute to the fitness advantage of cancer
cells. This work is significant because it will be the first dissection of 5hmC’s contribution to local and long-range
epigenetic heterogeneity and the first computational framework to uncover the cross-talk between DNA
modifications and other transcriptional regulators via chromatin interaction data. Collectively, this work will yield
a fuller picture of the molecular events that underlie fundamental changes in cell state and behavior.