PROJECT SUMMARY
Precision genomic medicine depends on a complete understanding of all forms of genetic variation in normal
individuals. However, current approaches for studying genetic variation in humans have yielded an incomplete
snapshot of somatic variation and its contribution to health and disease, as current approaches typically
sequence a single tissue (blood) and are not well suited for identifying structural variants, variants in repeat
elements, or the functional consequences of somatic variants. The goal of our proposal “Mosaicism in Human
Tissues, from Telomere to Telomere” is to characterize multiple types of human somatic variation across the
entire human genome in a set of 10 tissues from 50 donors, and to work with other SMaHT network members
towards producing a framework for understanding somatic variation in non-pathological human tissues.
To advance these goals, our GCC will use a highly successful pipeline that has produced tens of thousands of
high-quality human genomes, including the first ever complete telomere-to-telomere human genome. We will
produce high quality short and long-read DNA sequencing data, full length transcript RNA sequencing data,
single-molecule chromatin profiling data, and long-range chromatin conformation data from each donor.
This approach will enable us to generate donor-specific reference genome assemblies, which we will use to
call somatic variants in their originating haplotype genomic context. Calling variants independent of
traditionally incomplete human references will vastly improve our ability to accurately identify somatic variants
in complex repeat regions and other “unmappable” areas. These regions are precisely the locations where
somatic mutation rate is expected to be elevated because they are challenging for the cell’s endogenous
replication and proofreading mechanisms. Additionally, our approach will enable us to directly interrogate the
impact of identified somatic variants on overlying epigenetic and transcriptional gene regulatory patterns.
This GCC brings together three internationally recognized Principal Investigators (Drs. Bennett, Eichler, and
Stergachis), with decades of expertise in high-throughput genomics, somatic variant discovery, structural
variant identification, long-read sequencing and chromatin biology. Along with other members of the SMaHT
network, we will produce the most complete catalogue of somatic variation and its gene regulatory impact to-
date.