PROJECT SUMMARY
The over-arching goal of this project is to address several major challenges to biologic interpretation, functional
validation, and clinical translation of genetic association findings for quantitative red blood cell traits and non-
malignant blood cell disorders in the post-genomic era. In Aim 1, we will apply state-of-the-art statistical genomic
and computational tools to extremely large human multi-ethnic population-based datasets containing hundreds
of thousands of individuals with red blood cell traits (hemoglobin, hematocrit, RBC count, MCV, MCH, MCHC,
red cell distribution width or RDW) and whole genome sequence (WGS) data (the NHLBI TOPMed WGS project)
or GWAS data (Blood Cell Consortium or BCX and UK Biobank) to provide updated analysis, discovery, and
interpretation of results for common, low-frequency, and rare genetic variants associated with red blood cell
counts and indices. In Aim 2, validation of new red blood cell phenotype-associated genomic loci and genetic
variants will occur through a combination of imputation and replication in independent data sets (using TOPMed
WGS as imputation reference panel), and/or de novo genotyping or sequence analysis of selected phenotypic
samples or pedigrees. We will also provide functional annotation, fine-mapping, and prioritization for new and
existing red blood cell trait-associated variants and genes, with an emphasis on new blood cell lineage-specific
epigenomic, transcriptomic, and 3D genomic resources, including those becoming available through TOPMed
and BLUEPRINT projects. In Aim 3, we will perform functional, cell-based analyses of selected non-coding
genomic loci/ variants (~50 per year) identified in Aims 1 and 2 (particularly those that alter canonical transcription
factor motifs and demonstrate clinical impact through PheWAS or co-segregation with phenotypic extremes in
pedigrees) utilizing a combination of massively parallel reporter assays (MPRA) and CRISPR/Cas9 genomic
perturbation to interrogate non-coding genetic variation and thereby provide comprehensive and predictive
assessments of regulatory non-coding variation and function. We will disseminate all genomic, annotation, and
functional information derived from Aims 1, 2, and 3 to ensure knowledge dissemination to the clinical and
scientific community, for discovery, fine-mapping, and investigation of causal genes that underlie red blood cell
traits and hematological disorders.