Project Summary/Abstract
The human genome contains more than 1,600 transcription factor-coding genes. Transcription factors (TFs)
play essential roles in gene regulation and are relevant to all aspects of human biology, from development to
disease mechanisms. Defining the activity and function of TFs is vital to understanding gene expression, and
the most informative assay for TF activity is the genome-wide identification of loci where a TF interacts with
DNA. These loci are classically measured by ChIP-seq (chromatin immunoprecipitation followed by
high-throughput sequencing) or variants such as CETCh-seq (CRISPR epitope-tagged ChIP-seq). As
members of the ENCODE Consortium, the Myers/Mendenhall group generated 1,203 ChIP-/CETCh-seq maps
for 676 human TFs, and nine other groups produced a combined 742 maps for 463 additional TFs. However,
~700 human TFs still lack genome-wide binding maps. This proposal aims to expand this data resource,
producing binding maps for 600 more human TFs using an established production pipeline that can begin
immediately. The first goal is to complete 200 binding maps in the HepG2 cell line, thus generating data for the
vast majority of TFs expressed in this cell line. This resource will be powerful for many analyses, representing
a near-complete overview of all active TFs in this cell type. The second goal is to generate binding maps for
400 human TFs with no ChIP data and do so in the cell types that express the TFs. Over sixty percent of these
TFs will be assayed in induced pluripotent stem cells (iPSCs), neuronal precursor cells (NPCs), or neuronal
cells differentiated from these, and the remaining TFs in a variety of other cell lines. These cells have been
used successfully in the pipeline and have other important genome-wide assay data, including ATAC-seq,
RNA-seq, various histone modification ChIP-seq experiments, and 3D connectivity data. The resource
produced will represent a greatly expanded database of DNA binding maps for most human TFs; these
experiments have long been a goal for biology and will enable many analyses to comprehensively define the
grammar of TF gene regulation.