Project Summary
Mass spectrometry (MS) based proteomics is currently the most widely used technology for the analysis of
complex protein mixtures. It has the ability to detect and quantify the abundance of thousands of proteins and
their variants, post-translational modifications, and interactions per experiment. There is a robust set of open,
standardized data formats for encoding data and metadata from most stages of MS proteomics analysis,
developed by the Proteomics Standards Initiative (PSI). However, there is not currently a standardized
mechanism for universally referencing a spectrum that is used in an analysis or held up as evidence for a
published claim. Further, despite the widely recognized significant advantages of spectrum matching
approaches, an approved PSI standard for the storage and exchange of reference spectra in the form of
spectral libraries is still glaringly absent. Here we propose a major advancement in data standards for
proteomics mass spectra with the development of three interrelated standards.
First, in order to solve the difficulty in identifying and accessing a specific spectrum in resources throughout the
world, we will develop a universal spectrum identifier standard that can be widely used to reference, locate and
access a specific spectrum. Second, building on PSI's extensive experience in developing official standard
formats that are widely used, we will overhaul the current set of crude spectral library formats and develop a
new standardized and comprehensive spectral library format that will be effective for the storage, use, and
exchange of reference spectra. Third, we will develop a standard application programming interface that
deploys the standards to the whole community by enabling users and automated software to query and
exchange information about spectra, peptides, and proteins.
These standards will be developed according the effective methodologies that the PSI has developed since its
inception in 2002. This means that we will assemble the important stakeholders from all over the world to
jointly develop the standards, create specification documents and examples. These specification documents
then undergo the official PSI document process, which subjects each proposed standard to three rounds of
iterative review and refinement. We will then develop open-source software that enables the use of these
standards in multiple programming languages in order to promote widespread usage. Finally we will implement
these standards via these software libraries at the three largest ProteomeXchange proteomics data
repositories, which will ensure high visibility. The development of these three interrelated standards will
achieve a substantial advance for the field of proteomics MS, and may well extend to MS-based metabolomics
as well.