A comprehensive analytical framework for multi-source, multi-way, and multi-cohort data - Project Summary The over-arching goal of this project is to develop new strategies for the integrative analysis of high-throughput biomedical data, that are generally applicable to a wide variety of applications and scenarios. This is needed, as rapidly developing molecular “omics and imaging technologies have allowed for more comprehensive measurement of multi-faceted biogical systems at lower costs. As a result, data for a given study will often have high-throughput data that are linked across multiple sources (e.g., different technologies) and multiple dimensions or ways (e.g., multiple tissues, cell types, regions, or time points). We have a strong track record of developing impactful statistical methodology and widely used software for data integration in this general setting, motivated by tangible applications to neurodegenerative disorders, pulmonary disorders, early-life nutrition, and other domains. For this project we will undertake new methodological aims that are driven by emerging data challenges in these areas, including integration across multiple sample groups or “cohorts . Our central methodological objective is to develop a very flexible framework for bidimensional regression and factorization that simultaneously identifies covariate-driven effects and auxiliary structured variation in multi-source, multi-way and multi-cohort data. Our general model will be able to address the following tasks, as needed: (a) the decomposition of covariate effects and low-rank structure which may be shared across any sources or sample sets via a general objective function, (b) Bayesian inference for the identified decomposition with efficient posterior sampling algorithms, (c) missing data imputation, (d) classification of the sample co- horts, and (e) tracking progression for longitudinal data. We will apply our methods to address tasks in diverse biomedical areas, including (a) identifying multi-omic signatures in human breast milk that correlate with infant brain development, (b) identifying a comprehensive model for progression of Friedreich's ataxia across multiple modalities, and (c) identifying multi-omic molecular pathways indicative of HIV-association chronic obstructive pulmonary disease across different tissues. The broader impacts of this work extend to the wider biomedical research community, as open-source software packages will facilitate the adoption of these methods by other researchers to enable the integration of multi-source, multi-way, and multi-cohort data, filling a critical need in the rapidly expanding landscape of biomedical studies.