Abstract
Two significant paradigm shifts are underway in cancer genomics: single-cell genomic profiling and the growth
of the NCI Cancer Research Data Commons. Single-cell genomics is transforming our understanding of
complex tumor populations and revealing new insights into tumor composition, microenvironment, cancer stem
cells, and drug resistance. Several large-scale, single-cell-focused, national and international projects are
currently underway, including HCA, HTAN, and HuBMAP. Data generated by these projects will impact almost
every aspect of biology and medicine. For these projects to realize their full potential,
it is essential to have
data visualization and analysis tools that make these resources accessible
to a broad group of biomedical
researchers. This is challenging, however, as existing data visualization and analysis tools simply cannot scale
to handle these large datasets. The second paradigm shift is NCI’s development of the Cancer Research Data
Commons (CRDC), a virtual data science infrastructure that connects cancer research data collections with
analytical tools, leveraging the dynamic computing power of the cloud. Efficient and secure incorporation of
widely-used 3rd party tools and platforms, including interactive visualization tools such as UCSC Xena, into
CRDC is needed to make this resource truly useful. As both of these transitions continue to accelerate in the
coming years, they present challenges and opportunities. We propose to enhance UCSC Xena to support and
enable these transitions through four aims. Aim 1. We will scale up UCSC Xena by 100x to support the
visualization of datasets with greater than 1 million cells (more generally, 1 million bio-entities) without any loss
of data or interactivity in the web browser. We will employ several new advances in computer engineering to
achieve this performance gain. In addition, we will develop three new visualizations to enable researchers to
better explore single-cell data. Aim 2. We will securely integrate UCSC Xena with resources in the NCI CRDC
and its community of data analysis tools and platforms. Our integration will make loading ending analysis
results into a private Xena Hub in CRDC for visualization in the context of large public data a routine practice.
Aim 3. We will provide visualization of the most current cancer genomics resource data through the expansion
and update of UCSC Xena database with key projects and datasets. We will collaborate with the Treehouse
Childhood Cancer Initiative to build a harmonized preclinical pediatric genomics data resource and make it
publicly available on the Xena Browser. This work will leverage PDX models and brain tumor organoids
currently being developed and profiled by Dr. Haussler’s group. Aim 4. We will improve user workflows and
engagement through User Centered Design, as well as continue user education, support, and outreach.