PROJECT SUMMARY/ABSTRACT
Somatic mosaic mutations accumulate over time in every healthy cell but detecting them requires specialized
sequencing technologies with extremely low error rates. However, all current technologies for profiling mosaic
mutations require amplification of DNA, which introduces single-strand DNA artifacts. Therefore, even the
highest fidelity technologies can only detect mosaic mutations when they are present in both strands of the
original DNA, but they cannot detect the single-strand mutations and damage from which they originate. Here,
we develop a technology that can directly sequence DNA molecules without any amplification at ultra-high
fidelity, such that mutations and damage present in only one of the two strands of a DNA molecule can be
detected for the first time. It achieves this by significantly increasing the accuracy of single-molecule DNA
sequencing, and furthermore, it utilizes long reads that can be used to study regions of the genome that are
not accessible to all prior high-fidelity mosaic mutation technologies that utilize short reads. Our technology,
called Hairpin Duplex Enhanced Fidelity Sequencing (HiDEF-seq), will be developed as part of the SMaHT
Network, and we will work in close coordination with the Network at all stages of the project to ensure it
contributes significantly to the Network’s goals of creating a comprehensive catalogue of somatic mosaicism in
human tissues. In the first UG3 phase of the project, we will develop our technology to cost-effectively and
reliably profile any bulk human tissue. In Aim 1 of UG3, we will develop the technology to profile all classes of
single- and double-strand mosaic mutations at ultra-high fidelity (substitutions, insertions, deletions, structural
variants, and retroelements). In Aim 2 of UG3, we will use machine-learning models of single-molecule
polymerase kinetics to detect diverse types of single-strand DNA damage and modifications. Importantly,
HiDEF-seq will achieve detection of all these events simultaneously in one assay. In the second UH3 phase of
the project, we will work closely and integrally with the SMaHT Network to validate and scale the throughput of
the technology so that it can profile the entire collection of SMaHT tissue samples. In Aim 1 of UH3, we will
fully automate the laboratory component of HiDEF-seq to enable creation of sequencing libraries for hundreds
of samples per day. In Aim 2 of UH3, we will scale the computational pipeline of our technology for rapid
analysis of thousands of samples. Throughout this project, we will work with the SMaHT Network to validate,
standardize, and disseminate the technology. HiDEF-seq’s achievement of ultra-high fidelity sequencing of
single-strand DNA mutations and damage will enable fundamentally new types of mosaic mutation studies that
will disentangle the interrelated processes of DNA mutation, repair and replication. It will also enable
systematic dissection of sources of artifacts stemming from laboratory processing of DNA. Furthermore, it will
reveal the instantaneous effects and temporal dynamics of exogenous mutagens, with broad implications for
environmental health and discovery of factors that reduce or increase the rate at which our genomes mutate.