Multi-omic Risk Prediction of Chronic Obstructive Pulmonary Disease in European- and African-Ancestry Populations - PROJECT SUMMARY/ABSTRACT Chronic obstructive pulmonary disease (COPD) is a leading cause of respiratory mortality worldwide15. Identifying highly susceptible individuals early in their disease course and understanding pathogenic mechanisms, before irreversible loss of lung function, is of utmost importance16,17. Genetics account for about 40% of COPD susceptibility18–20. Genome-wide association studies (GWASs) have identified multiple variants associated with COPD21–23. Individual variants are poor for risk prediction, but in aggregate genetic variants can account for a substantial portion of risk. Pooling millions of GWAS variants, I created a polygenic risk score (PRS) for COPD that can identify individuals at high risk for COPD, though performance was less optimal in non- Europeans24. Multi-ancestry PRSs are needed as genetic ancestry is not readily determined in clinical practice. Further, gene expression, reflecting genetic and environmental influences, provides pathobiologic information for COPD susceptibility and heterogeneity. A transcriptional risk score (TRS) for COPD that adds predictive value above clinical risk factors25 has yet to be developed. The appeal of using -Omics data for risk stratification is that these data can lend insight into why certain COPD subgroups are at elevated risk of progression. Gene regulatory networks26 have been used to uncover mechanisms of COPD heterogeneity that were not found by traditional gene-based approaches. Therefore, we hypothesize that polygenic and transcriptional risk scores will substantially improve upon clinical factors in identifying those at higher risk for COPD and related phenotypes, and can be used to identify pathways for therapeutic intervention. We will train multi-ancestry PRSs using 4,225 African ancestry individuals from UK Biobank and existing analyses of 8,429 African-Americans from CHARGE, and test in the Genetic Epidemiology of COPD (COPDGene: n=10,198) study and Lung Tissue Research Consortium (LTRC: n=1,078). We will create a multi-ancestry transcriptional risk score (TRS) using whole blood RNA-sequencing (RNA-seq) data in training (n=3,394) and evaluate predictive performance in testing samples (n=1,131) of COPDGene. We will use Connectivity Map (CMap)8,27 to identify drug repurposing candidates based on TRS transcripts. We will leverage lung RNA-seq data from LTRC to create a lung TRS, and test in COPDGene blood samples. We will classify COPDGene participants along the axes of the existing PRS and lung TRS (e.g. “High” PRS, “Low” TRS), which we expect will identify those at high risk for COPD-related phenotypes and progression. To understand why certain individuals are at high risk for COPD phenotypes, we will utilize gene regulatory networks to identify pathways differing between PRS/TRS classifications, and use the Gene RegulAtory Network Database (GRAND)9 to prioritize drug repurposing candidates. These aims will generate data for future studies, which will focus on validating COPD -Omics risk scores and drug candidates in real-world cohorts1, and using machine learning to predict the network effects of drug candidates. The proposed research and career development plan will train me to use machine learning for multi-omic integration and risk prediction.