The Utility of Advanced Unsupervised Learning Methods to Model Dietary Intake Data - PROJECT SUMMARY Coronary heart disease (CHD) is a leading cause of death in the U.S. Despite the importance of diet as a primary prevention strategy, current population-level dietary guidelines are ineffective, resulting in rising disease prevalence. Precision nutrition can improve CHD prevention but is limited by our inability to accurately predict what dietary pattern(s) will associate with disease at an individual level. To realize the promise of precision nutrition, two problems with the current approaches to diet modeling must be addressed: the modeling of complex dietary patterns and measurement of dietary intake. Traditional dietary pattern analysis methods oversimplify dietary components and cannot capture complex food interactions, and self-reported dietary intake is prone to recall bias. The application of artificial intelligence to large-scale biobanks offers a potential solution. Variational autoencoders (VAEs) and graph neural networks (GNNs) are classes of deep learning models that learn a simplified representation of high-dimensional data and have the capacity to model complex relationships, such as those present in dietary intake data. Under the guidance of a mentorship team with expertise in precision medicine and clinical informatics, I propose to apply these innovative methods to the UK Biobank (UKB; ~103,000 individuals) and the National Heart, Lung, and Blood Institute (NHLBI) Trans-Omics for Precision Medicine (TOPMed; ~30,000 individuals) cohorts. I will use a GNN to model compositional dietary patterns and a VAE to model omics–dietary pattern associations as objective markers of dietary intake. To address limitations in dietary pattern modeling, I propose: Aim 1 – Evaluate the utility of GNNs to model dietary patterns and food interactions. I will perform a comparative analysis of GNNs against principal components analysis and k-means clustering, current data-driven dietary pattern analysis methods, by evaluating (1A) the consistency and novelty of dietary patterns identified across methods, (2B) the accuracy of CHD risk prediction; and (1C) robustness of the model through external validation in TOPMed. I hypothesize that the GNN will (1) identify novel dietary patterns, (2) model the impact of food interactions on CHD risk, and (3) outperform traditional methods in CHD risk prediction with robust results across datasets. To address limitations in dietary intake assessment, I propose: Aim 2 – Identify multi-omics profiles of dietary patterns as an objective marker of dietary intake. I will apply the previously developed multi-omics variational autoencoder (MOVE) framework to UKB and TOPMed cohorts to (2A) identify novel metabolomics/proteomics profiles that associate with dietary intake patterns, and (2B) characterize individual response patterns to diet perturbation. I hypothesize that (1) MOVE will identify novel multi-omics response patterns of dietary intake and (2) there will be distinct omics and clinical response patterns to diet perturbation. This project, supported by the Icahn School of Medicine, offers training in precision nutrition and nutritional epidemiology, facilitating my professional development and computational skills expansion for an independent research career.