Project Summary/Abstract:
Improvements in sequencing technology have allowed genome and transcriptome profiling of large groups
of research subjects. Projects such as The Cancer Genome Atlas (TCGA), the Encyclopedia of DNA Elements
(ENCODE), the Genotype-Tissue Expression Project (GTEx), and other have placed large, complex, multi-
omic data into the public domain. While these large projects and the use of new sequencing technologies has
made an unprecedented quantity of data available, technical challenges such as moving and analyzing large
multi-omic data sets, and the lack of intuitive and easy to use tools for data analysis, have limited broad
exploration of the available data, often separating experimental biologists and domain experts from directly
exploring relationships within the data.
More than 15 years ago, we began development of MeV, a freely-available, open source software tool for
intuitive analysis of genomic data. The simple graphical user interface and the extensive library of state-of-the-
art analytical methods made MeV one of the most widely used software tools in bioinformatics, with nearly
260,000 downloads since we began keeping statistics in 2008 and downloads of nearly 30,000 per year for the
past few years. Despite the success of MeV and its continued use, we recognized that large-scale, multi-omic
data sets can no longer be analyzed easily using a desktop application. To keep pace with the data, we
needed to develop a new platform that draws on modern computing technologies, including cloud-based
computing and scalable data storage.
The solution, funded by the NCI through the ITCR program (5U01CA151118), is a cloud-based, web-
enabled version of MeV (WebMeV; http://mev.tm4.org). WebMeV uses Google Cloud Platform (GCP) and its
Compute Engine infrastructure to leverage cloud-computing resources for analyzing large public genomic data
sets. In April 2016, we released a robust version of WebMeV and have seen use of the system grow
dramatically. The system has already been used to perform more than 350,000 analyses; WebMeV currently
performing more than 100 analyses per day, 3,735 users who have registered with the system and that group
is growing by 400 per month (registration is not required). To ensure wide use, we have done numerous online
tutorials, including two “sold out” tutorials for intramural investigators at the NCI where WebMeV has become a
critical tool for genomic analysis. In this application, we propose to continue to maintain and improve WebMeV,
to expand its capabilities by implementing methods for network inference and representation, to integrate with
the Cancer Genomics Cloud Pilots program, and to implement methods that can advance reproducible
research.