PROJECT SUMMARY
The goal of the Somatic Mosaicism across Human Tissues (SMaHT) Network is to better characterize the occurrence
of mosaic variants in human tissues and to understand their role in the regulation of biological processes that
impact health and disease. The SMaHT Data Analysis Center (DAC) aims to collect and curate all data generated
in the Network; assess, develop, and apply state-of-the-art analytical pipelines; and produce a variant catalog and
a data portal for the scienti¿c community. We have assembled a team of investigators, bioinformatics scientists,
data curators, and software developers with a strong track record in mosaic variant analysis, long-read data
analysis, data portal development, visualization, large-scale data management and computing, and development
of secure and ¿exible cloud technologies. In Aim 1, we will work with the Network members to de¿ne data and
metadata standards and ensure that high-quality data are generated, processed, and annotated uniformly and
ef¿ciently. In Aim 2, we will perform benchmark studies for the current tools and technologies for identi¿cation
of mosaic variants and lead an effort to de¿ne and implement analytical pipelines on a cloud platform. We will
also develop new approaches as needed. We will ensure that a comprehensive set of mosaic variants of all types
(single nucleotide variants, indels, copy number variants, translocations, complex rearrangements, transposable
element insertions, microsatellite mutations, repeat expansions, etc.) are identi¿ed, using short- and long-read
platforms, genome-wide and targeted assays, and bulk and single cell technologies. In Aim 3, we will build a
user-friendly and interactive data portal containing a variant catalog and featuring a read-level variant browser to
enable the scienti¿c community to fully utilize the Network data. We will ensure that all methods and processes
are documented for full reproducibility and that all tools and data are freely available to the community.
1