Genotypic and phenotypic approaches to subtype individuals with coronary heart disease - PROJECT SUMMARY Traditional genome-wide association studies (GWAS) have typically overlooked the heterogeneity in coronary heart disease (CHD), which could be one of the reasons for the gap between current risk assessment models and a personalized approach for CHD prevention and treatment. To bridge this gap, I will focus on the development of CHD without prior elevated traditional risk factors in diverse populations. The pooled cohort equations (PCE) are recommended by the American College of Cardiology and the American Heart Association to guide primary CHD prevention in clinical practice. However, the PCE’s limited portability to other ancestries, small sample sizes, and a limited set of variables available contribute to its imprecise nature in risk assessment. Patients who develop CHD without prior elevated traditional risk factors, thus having low PCE scores (defined as CHDlowPCE), may have different underlying mechanisms not captured by traditional risk factors in the PCE, and therefore require a more in-depth investigation for accurate CHD prediction. To address the knowledge gaps, I propose to leverage advanced statistical methods, including polygenic risk scores (PRSs) and machine learning (ML) algorithms, in diverse populations to better understand the physiological mechanisms underlying CHDlowPCE. I aim to identify novel genetic variants that are associated with incident CHD risk without prior elevated traditional risk factors (CHDlowPCE), prioritize potentially causal variants, and optimize multi-ancestry PRSs for CHD (Aim 1). I will create a novel score representing increased incident CVD risk and decreased traditional risk factors and perform multi-ancestry GWAS on it. I will also prioritize potentially causal variants and genes in identified loci and prioritize relevant biological system. In addition, I will build genome-wide PRSs that combine common and rare variants from summary statistics and evaluate their performance in assessing CHDlowPCE. In parallel, I aim to identify non-traditional risk factors for incident CHDlowPCE using EHR data and ML methods (Aim 2). This involves conducting a risk factor search using clinical features in the EHR data and a random forest-based ML framework, and evaluating the causal effect of novel risk factors on CHD using Mendelian Randomization approaches. Lastly, I will utilize these findings to subtype CHD patients based on genetic and phenotypic risk factors (Aim 3). I hypothesize that subsets of genetic loci that group based on their association signatures and pathway-based PRSs will shed lights on different disease mechanism and define subgroups for CHD patients in diverse populations. In summary, this project will enhance the CHD risk assessment by systematically evaluating the genetic and non-genetic risk factors underlying the misclassified CHD risk in diverse populations, and subtype patients based on CHD pathophysiology.