Integrative analysis of whole genomes and transcriptomes from multiple cell types in rare disease patients - Whole-genome sequencing (WGS) is revolutionizing the diagnosis of rare diseases. However, at present, even the most powerful approaches to etiological discovery typically fail to find a genetic cause in a majority of partici- pants (Turro et al., Nature 2020). There are a number of reasons for this. Firstly, rare disease studies are typically composed of small sets of unresolved cases, each sharing a different genetic etiology, which constrains statistical power when only WGS and clinical phenotype data are available on participants. Secondly, the unknown causal variants may have molecular consequences that are challenging to predict computationally, such as disruptions to the regulatory elements (REs) of a gene or the introduction of a cryptic splice site. Thirdly, some types of causal mutations, such as structural variants, are prone to being missed by WGS. Systematic, transcriptomic profiling of homogeneous cell populations taken from rare disease patients has the potential to overcome these limitations. We have access to a collection of ⇠1,000 comprehensively phenotyped rare disease study participants with WGS and RNA-seq of platelets, neutrophils, monocytes and CD4+ T-cells. Here, we present a research program of statistical, computational and experimental approaches to uncover novel etiologies of rare diseases that exploits the high dimensionality and the hierarchical nature of these data. We will concentrate on the etiologies under- lying ⇠300 cases with a rare platelet disorder (RPD), exploiting our expertise in blood genomics. In Aim 1, we will develop a Bayesian method for identifying rare disease-causing rare variants in REs, treating expression as a molecular mediator of genetic etiology. Our approach models the causal path between rare variants that overlap cell type-specific REs, the corresponding cell type-specific changes in expression, and the consequent alteration in rare disease risk. To include a recently discovered class of enhancer marked by H3K122ac but not H3K27ac in our hypothesis search space, we will generate H3K122ac data on the relevant cell types from healthy donors. In Aim 2, we will apply several approaches for identifying pathogenic changes in transcript sequences. For ex- ample, we will apply recently developed methodology for identifying splicing outliers within the cohort. To ensure these outliers are extreme in the wider population, we will compute splicing frequency spectra in large RNA-seq datasets such as GTEx. These spectra will capture the population distribution of the within-individual proportion of RNA-seq reads for a gene that include a given splice junction. We will also exploit the joint availability of WGS and RNA-seq in patients to identify extreme allelic imbalances at WGS-called heterozygote sites. The candidate variants that we identify will be validated in cell lines and primary samples. Rare diseases collectively affect one in 20 people but current etiological knowledge cannot resolve half of patients by WGS alone. The modeling and analysis of large-scale, patient-derived RNA-seq data on multiple cell types as molecular mediators of disease risk can fill this gap. The methodological and etiological output of our research program will ultimately boost the diagnostic power of WGS and broaden the scope of precision medicine.