PROJECT SUMMARY
Computational analyses have become instrumental to cancer research, but only a minority of cancer
researchers are computational scientists. Cancer researchers—especially those with limited computational and
data science skills—face substantial technical challenges performing data analyses. These challenges fall into
three broad categories—data, tools, and computational infrastructure—and include (1) connecting datasets,
software tools, and analysis workflows to perform an analysis; (2) using compute resources such as
institutional computing clusters or cloud computing to process often enormous datasets; and (3) performing
high-quality analyses by selecting from the bewildering array of tools and visualizations available. The impact
of these technical challenges on cancer research is severe, restricting the cancer research community’s
advances toward understanding, preventing, and treating the disease. Without resolving these issues, cancer
datasets and software analysis tools be substantially underutilized, and in turn the multimillion-dollar
investment in cancer data science will not translate into improved patient outcomes. We will address these
challenges in this project by creating the Galaxy for Cancer (Galaxy-C) platform. Galaxy-C will meet the
emerging data science needs across the spectrum of cancer researchers—individual experimentalists and
computational scientists, research labs, clinical trials, and national consortia. Our project has four aims. Aim 1:
Extend Galaxy-C with datasets and workflows to enable emerging cancer research in single-cell spatial and
multimodal data analyses. With these extensions, Galaxy-C will enable many new cancer analyses by making
high-value cancer datasets and novel analysis workflows widely available. Aim 2: Expand Galaxy-C with new
machine learning (ML) tools and an intelligent user interface. These new capabilities will ensure Galaxy-C
supports the increasing number of cancer analyses that require ML and help users be more productive with
assistance from its user interface. Aim 3: Integrate and use external computing services in Galaxy-C to power
larger cancer data analyses. Using external computing services will ensure Galaxy-C performs efficient and
cost-effective cancer data analyses, and use of community APIs will maximize the number of services Galaxy-
C can connect with and use. Aim 4: Grow a Galaxy-C cancer data analysis community through collaborations
with ITCR, the NCI Human Tumor Atlas Network (HTAN), and individual cancer researchers. The Galaxy-C
community will ensure that the Galaxy-C platform meets the needs of cancer researchers and maximizes the
impact of the platform on cancer research. The Galaxy community numbers tens of thousands, the ITCR
community has hundreds of analysis tools, and the HTAN consortium has hundreds of cancer researchers.
Successful completion of this project and creation of a Galaxy-C workbench will have tremendous impact
across these large communities doing state-of-the-art cancer research by making it possible for all cancer
scientists to do accurate, robust, and efficient computational and data science research.