Abstract
With rare exceptions, proteins in all domains of life are biosynthesized using the same twenty canonical
amino acid building blocks. However, the chemical and functional space accessible to proteins are greatly
expanded in living systems by a wide variety of different post-translational modifications (PTMs). These
PTMs play important roles in all aspects of our biology. In recent years, the catalog of known PTMs within
our proteome have expanded at a furious pace, thanks to advances in mass-spectrometry based proteomics
and related technologies. However, functional consequences of the overwhelming majority of these newly
identified PTMs remain poorly characterized. At the core of this deep knowledge-gap on a critically important
facet of our biology lies the difficulty of producing eukaryotic proteins in a homogeneous state of modification
for probing how their properties are modulated by a PTM in vitro or in vivo. For most PTMs identified through
MS-proteomics, the exact biochemical origin is either unknown or challenging to reconstitute without
additional pleiotropic consequences. Genetic code expansion (GCE) technology provides an exciting solution
for this problem by enabling co-translational site-specific incorporation of a modified residue into virtually any
site of any protein. However, despite its enormous potential, the scope of this technology in eukaryotic
systems remains limited by several technical challenges, including the restricted structural diversity of
noncanonical amino acids (ncAAs) that can be genetically encoded, poor efficiency of their incorporation,
etc. Over the last five years, our group has greatly expanded the scope of this technology by developing
innovative solutions to overcome these longstanding challenges, including: A) new platforms for genetically
encoding previously inaccessible ncAAs, B) a mammalian cell-based directed evolution system to improve
the performance of this machinery, and C) novel viral vectors that efficiently deliver the ncAA incorporation
machinery to wide variety of mammalian cells and tissues. These advances have opened the exciting
opportunity to use this powerful technology to systematically decipher the role of various PTMs observed in
the human proteome. To this end, in the next five years, we propose to develop new GCE platforms to access
new structural classes of ncAAs, use them to genetically encode previously inaccessible PTMs in eukaryotes,
optimize their efficiency through directed evolution, and use them to decipher the consequences of PTMs.
Furthermore, we will develop technology to systematically explore new protein-protein interactions triggered
by PTMs (e.g., with reader/eraser proteins), by site-specifically incorporating two ncAAs: one modeling the
PTM of interest and another harboring a photo-crosslinker. Finally, by overcoming longstanding challenges,
we will dramatically advance the scope of the GCE technology for application in mammalian cells, which will
have broad and deep impact far beyond the scope of this proposal.