Grammar-Driven Genomic Data Visualization - Project Summary
Our rapidly evolving understanding of how genomes function and how genomic variation influences the
development and progression of diseases drive our ability to develop novel diagnostics and therapeutics.
Genomic data science plays a critical role in this process, relying on the availability of computational tools for
statistical and visual analysis of large-scale and complex data sets. The growing genomics workforce that relies
on these tools includes scientists with a broad set of expertise and needs. Experimental scientists use tools with
graphical user interfaces to interpret their data; computational biologists write pipelines or code for ad hoc
analysis in interactive environments, and software developers build sophisticated data portals and other
web-based tools. While a large number of genomic data visualization tools for these audiences exist, there is a
lack of a unified approach that would allow a larger audience to design and implement their own interactive
data visualization tools for genomic data. To address this gap, we will develop a visualization framework
based on a novel grammar for interactive, scalable visualization of genome-mapped data. The visualizations
defined using this grammar will be interactive, responsive, and scalable. These features will be enabled by
rendering the visualizations using an extension of HiGlass. HiGlass is our framework for genomic data
visualization that supports multi-scale data visualization, and multiple linked views. The grammar design will
be guided by a taxonomy of genomic visualizations and visual analysis tasks that comprehensively describe
the space of interactive visualizations currently in use for genomic data. The grammar will support the
creation of visualizations with different genome layouts, visual encodings of data, and flexible configurations
of multiple linked views. Furthermore, we will incorporate a taxonomy of metadata visualizations, for
example, of phenotypic data, that are frequently linked to genomic data. To create visualizations based on the
proposed grammar, a JavaScript library, a Python package, an R package, and an interactive visualization
editor will be developed. This editor will be web-based and have a drag-and-drop interface for data and
visualization components. In addition to the genomic visualization grammar, our framework will also contain
a genomic visualization recommendation system that can generate interactive visualizations based on a
description of a data set and the analysis tasks that the user intends to accomplish. This will enable novices to
create effective visualizations without knowledge of visualization design. The recommendation system will
also accelerate visual analysis for more experienced users, as the visualization design can be automated and
customized. The recommendation system will be available through the R and Python packages and the
interactive visualization editor. In addition to producing visualization designs using our proposed grammar,
this recommendation system can also be used to recommend existing tools that implement specific
visualization capabilities.