SUMMARY / ABSTRACT
Genome-scale DNA sequencing has revolutionized the practice of precision medicine, at dramatically reduced
cost. It is possible today to sequence an entire human genome in roughly one day; however, bioinformatic
analysis typically takes days or weeks, and has emerged as the major bottleneck for successfully utilizing
genome sequencing in time-critical applications, e.g. for identifying the genomic vulnerabilities of a patient’s
tumor for rational cancer treatment selection within a clinically relevant timeframe. The overarching goal of this
proposal is to dramatically speed up genomic analysis algorithms via heterogeneous computing
techniques. Here we will focus on one critical aspect of genomic analysis, i.e. variant calling, and set the
ambitious goal of completing the analysis of a 60X-coverage Illumina whole genome sequencing dataset in under
10 minutes, far faster than the current state of the art. Although here applied to only one analysis task,
accomplishing such a high degree of acceleration would demonstrate that the techniques we are developing in
this proposal are also generalizable across many other genomic analysis tasks. Our approach is to first
accelerate the most widely reusable software components, to maximize value for the genomic analysis tool
developer community, who will then be able to integrate these components into their own tools.
With these reusable software components, we will accelerate the FreeBayes variant caller tool. FreeBayes is a
widely used germline variant and somatic mutation detection tool, and therefore acceleration will benefit a large
user audience. This software was developed in our own laboratory, and therefore we are intimately familiar with
its algorithms and code base, positioning us for success in this exploratory project. If successful, our technique
will be applicable for accelerating many, currently time-consuming analysis tasks. As a result, analysts will be
able to finish sophisticated data processing tasks within minutes, as part of their interactive analysis session
rather than a batched background process, and complete manual result review immediately after; rendering the
complete analysis process sufficiently fast for time-critical clinical applications.