PROJECT SUMMARY/ABSTRACT - NEW YORK GENOME CHARACTERIZATION CENTER
Large-scale sequencing efforts over the last two decades have been focused on generating DNA sequence
datasets from readily available tissues such as blood or saliva to identify germline variants associated with
disease phenotypes. However, limited progress has been made in characterizing somatic variants in healthy
tissues and their contribution to health and disease over the course of the human lifespan. Somatic variation has
historically been studied in the context of tumor biology; however, there is mounting evidence that somatic
variation plays an important role in the aging process, as well as in cardiovascular, neurodegenerative,
immunologic, and neurodevelopmental diseases. There is therefore a critical need to characterize the somatic
variant landscape in healthy human tissues in individuals of diverse race and ethnicity across the human lifespan.
The Somatic Mosaicism across Human Tissues (SMaHT) program will address this gap by establishing a
cohesive Network that will work together to create high-quality somatic variant catalog; a catalog that is broadly
shareable across the scientific community and that enables studies investigating the rates and patterns of
somatic mosaicism across cell populations and tissues, that can elucidate the mechanisms underlying clonal
development, evolution, and expansion, and that enables studies of the role of somatic mutation in disease
pathogenesis and progression. The New York Genome Characterization Center (NYGCC) will work
collaboratively with other SMaHT Network Centers to generate a high-quality somatic variant catalog using three
core high-depth sequencing assays: duplex whole genome sequencing (WGS), mRNA sequencing, and long-
read Oxford Nanopore WGS. These three core assays will provide an unprecedented and comprehensive view
of somatic mutations across a variety of healthy tissues. The data from deep WGS will enable discovery of
somatic SNVs, indels, mobile elements, copy number changes, and structural variants. The RNA sequencing
data will be used to confirm the presence of those variants that fall in expressed genes, and further evaluate
their effect on splicing. The long read WGS sequencing will be used as a corollary to short read WGS to confirm
and enhance discovery of mobile elements, copy number changes and structural variants. To these core assays
we propose adding single cell WGS sequencing using Direct Library Preparation Plus (DLP+) and genotyping of
transcriptomes (GoT). DLP+ is an amplification-free single cell WGS assay that allows high sensitivity detection
of copy number changes, loss of heterozygosity, and structural variation. It further enables the study of replication
timing, clonal expansion and fitness and is compatible with pooled pseudo-bulk analysis to compare against
deep bulk WGS. The genotyping of transcriptomes assay will allow us to explore, for expressed somatic variants,
the cell type or lineage in which they occurred and by pairing with single cell expression data (and cell surface
marker detection and long read transcript sequencing) the functional effects of these mutations.