PROJECT SUMMARY
The project will contribute MassIVE.quant, a novel data resource for quantitative mass spectrometry-based
proteomics.
Quantitative mass spectrometry characterizes proteins in complex biological mixtures with the highest available
accuracy, sensitivity and throughput. Analysis of most such experiments involves identification of peptides and
proteins that generated the spectra, and relative quantification of changes in abundance between pre-defined
conditions. While the identifications workflows are now mature and ready for reproducible research, the
quantitative workflows lag very far behind. No repositories can now store the analyses results across all
workflows, and it is often impossible for authors to provide their data in a form that allows independent evaluation
and reuse. This undermines the reproducibility and the impact of these investigations.
The project combines the prior expertise of the Banderia’s lab in developing Mass spectrometry Interactive
Virtual Environment (MassIVE), a public repository for storing, documenting and re-analyzing mass spectra for
identification, and the prior expertise of the Vitek lab in developing MSstats, a broad-scope collection of statistical
methods and software for quantitative proteomic workflows. First, the project will fully document and annotate a
medium scale “training set” of quantitative investigations (which often rely on manual procedures), to develop
standards for documenting and annotating the experiments with respect to the biological origins of the samples,
and the technological aspects of data acquisition and processing. Second, the project will design functionalities
for repository-wide complete and automated re-analyses of the original investigations, using a limited number of
“good practice” workflows. The re-analyses will fully preserve the provenance of the results, and will be used to
further characterize potential pitfalls in the experimental designs and conclusions. Finally, the project will place
these investigations into a broader scientific context. It will design a query infrastructure that links each
experiment to its peer investigations, i.e. investigations with similar biological or technological aspects, to provide
insights into consistency of the results.
Continuing the extensive prior outreach efforts of the PIs, the results will be disseminated to a broad community
of stakeholders, including proteomic scientists, tool developers, journal editors, trainees, and scientists interested
in protein-level information.
The project will shift the mass spectrometry-based research paradigm, by creating a public resource that
currently does not exist in any form. It will expand the technical capabilities of the field, ultimately allowing us to
make more accurate of statements about the biological function.