Project summary
Somatic mutations accumulate in normal tissues and are increasingly recognized as a crucial
determinant of disease risk, especially in age-related conditions and cancer. Somatic mutations show
enrichment in portions of the noncoding genome that show “open” chromatin structure, such as active
promoter and enhancer elements, because open chromatin is more vulnerable to mutagens. Furthermore,
transcription factors binding appears to obstruct DNA repair, increasing the likelihood of forming fixed, double-
stranded mutations. The channeling effects of these mechanisms result in a concentration of somatic
mutations in restricted, yet critical, regions of the genome. Somatic mutations with an increased likelihood of
causing diseases frequently arise at recurrent genomic sites, and often even recurrent mutations at specific
bases, allowing for the development of targeted methods with greater sensitivity, lower cost, and higher
throughput to identify somatic mutations than traditional sequencing techniques.
Present methods for identifying somatic mutations generally utilize deep (=250X) whole genome
sequencing (WGS) and tend to be expensive, create large datasets that are computationally challenging to
analyze, and have limited ability to detect somatic variants with very low allele fractions. We propose a two
phase approach to developing a new tool to address these shortcomings. In the first phase we will develop a
method of detecting somatic mutations using ATAC-seq. ATAC-seq targets the open chromatin regions of the
genome so is focused on regions with increased somatic mutations that have an increased likelihood of being
biologically meaningful, only incorporates a fraction of the genome creating a more manageable dataset, and
allows for deeper sequencing to increase the sensitivity of somatic mutation detection. This phase of the
proposal includes three aims: modification of the ATAC-seq protocol to allow for detection of somatic
mutations; development of analysis software to analyze the data; and testing of the protocol.
In the second phase of the protocol, data obtained from phase one will be used to develop a panel
sequencing protocol to further narrow the genomic regions looked at, reduce the cost of the analysis, and allow
for extracted DNA to be directly analyzed (rather than the intact chromatin needed for ATAC-seq). This phase
will also involve three aims: expansion of ATAC-seq analysis to determine the best regions to include on the
sequencing panel; development of the sequencing panel; and testing of the panel on a range of individuals and
tissue types.
This project will provide rapid and inexpensive methods for the detection of potentially critical somatic
mutations in any tissue type. At a research level, it will allow for the analysis of a large number of samples to
provide critical information on biologically important somatic mutations and thus be an important tool that will
help illuminate the spectrum of somatic mutation in the noncoding genome.