SUMMARY - Subtle changes in protein expression are critical for proper growth and development, but irregu-
larities can cause deleterious cellular effects or large-scale biological dysfunction. Sequencing samples with
complex mixtures of proteins could greatly accelerate research into protein function and biology, but there is
currently no efficient and cost-effective strategy for protein sequencing at single-amino-acid resolution.
Two methods are commercially available for protein sequencing. In the first, “Edman degradation”, bulk quanti-
ties of whole protein or purified fragments are sequenced by cleaving the first (N-terminal) amino acid and chem-
ically identifying it. In the second method, based on mass spectrometry, a single protein or mixture of proteins is
fragmented, and the molecular mass and charge of each fragment are analyzed. This information is compared
known protein sequences to infer the identity of the input proteins. Both of these methods require ~1 million
molecules of each protein, and Edman degradation cannot currently be used on heterogenous protein mixtures.
Existing approaches for single molecule protein sequencing are hindered by the number and diversity of amino
acids, as well as the interactions between amino acids that interfere with chemical identification of their side
chains. Harsh denaturation agents can mitigate some issues, but they can compromise the identification systems
themselves. In addition, denaturation agents only remove some of the intramolecular interactions of proteins.
Glyphic Biotechnologies is developing a novel strategy to sequence individual protein molecules in their entirety
from a heterogeneous sample. This process is based on ligating the N-terminal amino acid to a cleavable chem-
ical linker, which subsequently tethers it locally to the surface. Cleavage of the linker removes the N-terminal
amino acid from the protein for highly sensitive identification with no interference from protein structure or adja-
cent amino acids. The process is repeated for each subsequent amino acid, yielding the protein sequence. The
approach may simultaneously sequence millions to billions of individual protein molecules in hours, which will
revolutionize protein analysis by making large-scale protein sequencing feasible, inexpensive, and routine.
The current proposal focuses on developing reagents specifically to detect the N-terminal amino acid of proteins,
allowing amino acids to be digitally identified via this N-terminal isolation strategy. In Aim 1 we will generate
antibodies to recognize at least 10 different isolated amino acids – enough to identify ~90% of the proteome after
10 sequencing rounds. In Aim 2 we will further optimize the antibodies and demonstrate the feasibility of using
them to sequence individual proteins among a background of non-modified proteins.
Success of these Aims will enable the Glyphic protein sequencing platform to detect, quantify, and sequence
single proteins in complex protein mixtures in an unbiased fashion - without any prior knowledge of their identity
or even their existence. When commercialized, it will enable clinical diagnosis of disease based on the proteins
present in a patient sample and allow identification of unique proteins to for as-yet unknown biomarkers.