SUMMARY
Cancer is caused by dynamics of the genome, which ultimately translate into aberrations of the proteome
constituting the major functional and structural components of a cell. The proteome comprises a high level of
complexity driven by aspects such as post-translational protein modifications, accurately regulated protein
degradation, and functional regulation through protein-protein interaction networks. It is also considered the
closest molecular link to a biological system’s phenotype. Mass spectrometry is among the most important tools
to characterize proteomes, and its versatility is well suited to match the proteome’s complexity. It is, therefore,
surprising that the information on understanding and diagnosing cancer provided by the cancer proteome is
almost entirely untapped in clinical studies. Among the reasons for this is a lack of sample throughput of mass
spectrometry-based proteomics when compared to genomics technologies. This translates into higher analysis
costs and reduced access to proteomics. Our overarching aim for this proposal is to develop a novel mass
spectrometry-based proteomics data acquisition method that increases sample throughput of deep proteome
mapping (>2000 proteins from blood plasma, >8000 proteins from tissue samples) in comparison to current
methods by a factor of up to tenfold (10 min per sample). The method is based on multiplexed isobaric
proteomics, a barcoding technology that currently allows the simultaneous analyses of up to 18 samples. The
novel aspect is the use of artificial intelligence (AI) to drive the data acquisition process. The proteomics
community has started to incorporate AI into their workflow for data analysis, but it has not yet been used for
improving data acquisition. Our AI system directs the mass spectrometer in real-time to optimize the analysis of
globally targeting all proteins assumed to be in a sample at a fast rate. Proteome samples are digested into
peptides, and a combination of neural networks trained on millions of mass spectrometry spectra is used to
predict in real-time peptide analyte behavior to optimize the analytical speed at high analytical depth. A
preliminary version of the method allows mapping 1,300 plasma proteins in 10 min per sample. We propose in
Aim 1 further improvement of the method through additional neural networks enabling more sensitive real-time
peptide identification and the simultaneous identification of multiple peptides. Our goal is to generate a method
that will routinely quantify 2000 proteins from human plasma in 10 minutes. The method will be incorporated
into a platform that also includes low-cost automated sample preparation to achieve an overall analysis cost of
<$100 per sample. We propose to evaluate the method in Aim 2 by mapping the proteome of 500 clinical plasma
proteome samples from lung cancer patients with different pathological cancer stages. Our preliminary data
analysis shows a high predictive power of mass spectrometry-based proteomics for detecting early-stage lung
cancer, and we will use the evaluation to validate detection power across early to late stages. We believe that
our novel AI-driven mass spectrometry-based proteome mapping has a high potential to overcome the current
hurdles of using deep coverage proteomics in clinical settings.