ABSTRACT
One of the central challenges in cancer genomics is the ability to accurately detect somatic mutations in
heterogeneous tumors, and precisely determine their clonal origin and evolution. This fundamental knowledge
is central to the discovery of new cancer therapies. In recent years, reductions in the cost of whole-genome and
whole-exome sequencing have enabled researchers to address these questions in unprecedented detail.
However, a major limitation in the field has been a paucity of methods for variant calling that extend beyond
identifying simple single-nucleotide variants (SNVs) and small indels to allow the characterization of complex
structural changes that also play a significant role in tumorigenesis and cancer progression. Indels of more than
a few bases are challenging to discover with typically used alignment-based methods. In addition, most variant
callers analyze tumor and normal data separately, which can introduce false positives such as when a mutation
shows partial support in the normal sample. Towards addressing these shortcomings, we recently introduced
Lancet, a new somatic variant caller developed under the auspices of the ITCR R21. Lancet leverages local
assembly and joint analysis of tumor-normal paired data using region-focused colored de Bruijn graphs, with on-
the-fly repeat composition analysis and a self-tuning k-mer strategy. This results in relatively reduced reference
bias; an improved ability to detect variations that significantly diverge from the reference chromosome
representations; a reduction in the scale of the analysis, leading to increased power and sensitivity to detect
variants through localized, comprehensive graph exploration; and dynamic adjustment of calling behavior
according to the sequence conditions of each genomic region. In testing, Lancet shows superior performance to
all major alignment-based methods in terms of accuracy, particularly in the detection of ‘twilight zone’ indels (30-
250 bp). Given its continued adoption and successful application in over a dozen high-impact publications,
Lancet is poised for more advanced development to enable continued improvements in its variant calling power,
precision, and analytical capabilities. Specifically, Lancet is currently limited by longer runtimes than alignment-
based methods, reduced sensitivity for longer insertions, lack of interactive visualization of the colored de Bruijn
graph, and the inability to jointly analyze longitudinal data. To address these limitations, we propose the following
Specific Aims: 1) Increase computational performance and facilitate user adoption and third-party development;
2) Add new features and enhancements to improve variant detection, phasing, and data visualization; and 3)
Enable joint assembly and analysis of longitudinal data. Impact: With additional development, the next iteration
of Lancet will feature advanced algorithms for fast, efficient, accurate, localized, user-friendly, and application-
modifiable variant analysis of phased genome-wide timeseries data, establishing it as one of the leading methods
for variant calling in cancer research.