A reference-free computational algorithm for comprehensive somatic mosaic mutation detection - ABSTRACT
Somatic mosaicism (SM), i.e. the presence of cells with somatically acquired mutations, is a driving feature of
cancer and several developmental diseases. However, whereas today we have detailed understanding and
predictive models of benign and pathogenic inherited polymorphisms, germline de novo mutations, and tumor
mutations, we have only limited knowledge of the burden, allele frequency spectrum, clonal patterns, and
mutational signatures of healthy somatic mosaicism. Realizing that such currently missing knowledge is critical
for informing experimental design in future studies of mosaicism’s biological and clinical consequences, NIH is
launching an ambitious initiative, the Somatic Mosaicism across Human Tissues (SMaHT) project to construct a
comprehensive human somatic mosaicism atlas. As part of this initiative, funding announcement RFA-RM-22-
011 calls for Tool Development Projects to develop “approaches that significantly improve the sensitivity,
accuracy, and threshold of detection of all types of somatic variants across the complete genome”. Such
comprehensive detection is currently challenging because somatic mosaicism mutations occur across a wide
range of mutation types and lengths, but the majority of today’s variant detection tools have low sensitivity for
larger, structural events. Furthermore, somatic mutations are typically at very low allele frequency (<1%), but
accurate detection of low-frequency variation today is beyond the capabilities of most tools. We have pioneered
a unique-kmer guided detection approach in our RUFUS tool, designed for germline de novo mutation detection.
This approach focuses on identifying the novel DNA sequence created by a mutation, which allows the same
underlying algorithm, with uniform algorithmic behavior and sensitivity, to be applied across the full range of
mutation types. RUFUS has been validated for accurately detecting germline de novo mutations in large
discovery datasets and rare-disease diagnostic studies. Our preliminary analyses also indicate that RUFUS has
high sensitivity across a full range of somatic mutations. This application proposes to adapt the RUFUS
algorithm for somatic mosaic mutation detection with high sensitivity and specificity across the entire
mutation type, mutation length, and allele frequency spectrum; and thus, substantially contribute to the
construction of a comprehensive mosaicism atlas. To achieve this overall goal, in the first (UG3) phase of
the project we will focus on algorithmic development to improve low-frequency allele detection, empirically
characterize RUFUS’s sensitivity and specificity, and ready the tool for adoption into the SMaHT Network’s
central analysis pipelines. In the second (UH3) phase of the project, we will integrate RUFUS into the central
analysis workflow of the SMaHT consortium; optimize and extend its performance for analyzing the vast SMaHT
somatic mosaicism dataset. We anticipate that RUFUS will contribute substantially to the SMaHT Initiative's goal
to comprehensively map out human somatic mosaicism across individuals, organs, and tissues.