ABSTRACT
The Baylor-Hopkins Clinical Genomics Center (BHCGC), comprised of teams at the Baylor
College of Medicine Human Genome Sequencing Center (HGSC), John’s Hopkins University
Center for Inherited Disease Research (CIDR) and the University of Texas School of Public Health
(UTSPH), will provide short read whole genome sequences (srWGS), genotyping arrays and
genetic interpretation for the National Institutes of Health’s All of Us Research Program (All of
Us). The BHCGC has provided ~1/3 of the program’s genomic and genetic data so far, generating
171,226 genotyping arrays, 154,349 srWGS sequences and populated reports for ~36,000
individuals via 54,349 variant interpretations, with ~1,000 positive findings. The team can provide
data for up to 100,000 participants in the next year and the capacity can easily be further scaled to
complete the remainder of the programs goal of 1,000,000, through outyears. For srWGS, the
Illumina NovaSeq X platform will be validated for FDA IDE approval, while automated literature
scanning will speed and simplify the task of variant interpretation. New tools will be introduced
to speed variant re-analysis of All of Us data, so that updated reports can easily be generated.
While collaboratively building infrastructure for the All of Us program, the BHCGC has made
major contributions, including (i) array choice and design of content, (ii) calibration of the variant
interpretation workflow, (iii) collaborative variant harmonization, (iv) to reprocessing of 240,000
array samples and 330,000 srWGS for the All of Us Data and Research Center (DRC), (v)
development of the FDA IDE, (vi) identification of sample contamination at the Biobank, (vii)
design of all the aspects of sample flow and manifests and (viii) contributing to critical decisions
related to overall program management. Each benefited from engagement with the All of Us
network, via individual meetings and regular group interactions. In the next year, this continued
engagement will ensure network coordination, cohesion, and synergism.
Additional Pilot, Demonstration and Driver projects are also proposed: Pilots aim to (i) greatly
increase the data from proteomic assays, at minimal cost to the program, (ii) develop an untargeted
metabolomic data resource and analysis workflow for the Researcher Workbench (RWB) and (iii)
continue long read DNA sequencing assays to complete previous commitments and to serve
Demonstration Projects and other network demands. Demonstration Projects will pave the way for
All of Us completion by (i) defining the pathway to a complete whole genome report, (ii)
determining how mixtures of long read data and srWGS can be combined to better resolve
important health related genomic variation, (iii) provide infrastructure, integrated RWB workflows
and illustrative examples of multiomic studies (DNA, methylation, RNA, protein, metabolomic).
Driver Projects will (i) analyze ‘missingness’ of All of Us Health Data on the RWB, with a view
to providing paths to metadata completeness, (ii) analyze extant long read data from Hispanic
Populations and (iii) study DNA methylation in a key mental health disease state. The Pilots,
Demonstration and Driver Projects together contribute to the final All of Us product, of
comprehensive genetic reporting and a complete research resource.
Together these contributions address all eight of the objectives of the current phase of the program,
paving the way for the ultimate success of All of Us.