PROJECT SUMMARY/ABSTRACT
The genetic information encoded in our genome is decoded and implemented via many multi-step processes,
including the proper decoding by transcription. Transcription of genes into mRNA by RNA Polymerase II (Pol II)
is a complex process that is precisely regulated both temporally and spatially at multiple steps by many large
molecular complexes (LMCs). In the past, a number of these LMCs have been identified and their structural and
functional role has been studied. Although we have learned a great deal about these LMCs at an individual level,
how these LMCs interact and affect one another and Pol II at a more comprehensive level has yet to be achieved.
In this project, we are proposing a multi-prong approach to define interactions and structures of LMCs, Pol II,
and model transcription factors (TFs) in an unbiased way and, as much as possible, under native conditions. We
will also evaluate the function of these specific interactions on the molecular mechanics of transcription and
regulation in cells. To this end, we will utilize a novel GFP aptamer-based purification method to identify LMCs
and TFs that associate with GFP-tagged Pol II and other critical LMCs. Purifications will be performed rapidly
and under native conditions to ensure retention of physiological interactions, and the resulting complexes will be
analyzed by both Mass Spectrometry and Cryo-EM to define the composition and structure of these LMCs at the
highest depth and resolution possible. Crosslinking with novel protein-protein crosslinkers and subsequent MS
analysis (XL-MS) will also be used to capture more transient LMC and TF interactions. In parallel, LMC-APEX2
fusions will be used to biotinylate nearby proteins and identify them by MS analysis following streptavidin
purification. Additionally, we will define the location of distinctly modified Pol II complexes or Pol II associated
with distinct LMCs at base-pair resolution along transcription units using our new PRO-IP-seq protocol. This
information combined with the MS analysis provides a unique and dynamic view of Pol II’s phosphorylation
status, composition, associations, and precise positioning along genes, and this information will be critical in
deriving molecular models of transcription and its regulation. Previously known and newly identified LMCs and
TFs that are deemed to have critical interactions will be perturbed by either RNA aptamer inhibitors or degron-
tagging to tease apart their functional roles. The rapid expression RNA aptamers, which interfere with specific
LMC interactions, and the rapid degradation of whole LMC subunits with degron technology will allow the
detection of the immediate, “primary” roles of those interactions genome-wide using the high-resolution assays
such as PRO-seq and ChIP-Exo. These assays will enable us to identify the specific functions of the key LMCs
and their interactions at an unprecedented resolution and sensitivity. Overall, we expect to derive a much better
and more complete understanding of the transcription cycle and its regulation. This will impact human health by
identifying new therapeutic venues and possible lead drugs (RNA Aptamers), as misregulation of transcription
has been observed in many disease conditions.