SUMMARY
Proteins are responsible for much of the structure and function of all cells. Subtle changes in expression of
various protein forms are critical for proper growth and development, but irregularities can cause deleterious
cellular effects or large-scale biological dysfunction. Proteins consist of chains of amino acids, which ultimately
determine the three-dimensional structure and functionality of the protein. As such, the ability to gather the entire
amino acid sequence of low abundance proteins can greatly accelerate research into protein function and biol-
ogy. However, in stark contrast to the relative success of DNA sequencing technologies, there is currently no
efficient and cost-effective strategy to sequence single protein molecules at single-amino-acid resolution.
Two methods are commercially available for protein sequencing. The first method, “Edman degradation”, re-
quires purification of the target protein. Bulk quantities of whole protein or purified fragments are sequenced by
cleaving off the first (N-terminal) amino acid and chemically identifying it. The second method, based on mass
spectrometry, requires enzymatically degrading a single protein or mixture of proteins into small fragments, then
analyzing the molecular mass and charge of each fragment. This information is compared to that of known
protein sequences to infer the identity of the input proteins. Both of these commercially available methods suffer
from low sensitivity, requiring ~1 million molecules of each protein for detection. Edman degradation cannot
currently be used in heterogenous protein mixtures, further limiting its utility.
Critical hurdles in single molecule protein sequencing are the number and diversity of amino acids, as well as
the interactions between amino acids that interfere with reagents that can identify amino acids by their chemical
side chains. Current approaches being developed for single-molecule protein sequencing could avoid some of
these issues by employing harsh denaturation agents, but these can compromise the identification systems
themselves. In addition, denaturation agents only remove some of the intramolecular interactions of proteins.
Glyphic Biotechnologies has developed a novel strategy to iteratively identify the first (N-terminal) amino acid
by isolating it from the remainder of the protein, using a linker molecule called ClickP. After binding the protein
to a solid surface, ClickP enables single molecule protein sequencing by a reiterative method of physically iso-
lating the terminal amino acid, then enabling its identification at high specificity and single-molecule sensitivity.
The approach has the potential to be scaled to sequence millions to billions of single molecules simultaneously
in hours. Developing this technology will revolutionize protein analysis by making large-scale protein sequenc-
ing feasible, inexpensive, and routine.