Computational approaches to characterize heterogeneity and improve risk stratification in complex disease phenotypes - PROJECT SUMMARY/ABSTRACT Recent technological breakthroughs have enabled the generation of clinical, environmental, and multi-omics data at an unprecedented scale, providing a complete profile of the patient for individualized disease diagnosis, prog- nosis, and treatment. However, the precision medicine approach is yet to realize its potential in most multi-factorial diseases, for which their highly polygenic nature, as well as phenotypic and genetic heterogeneity, complicate the identification of disease-associated cell type-specific transcriptional mechanisms. A better characterization of this heterogeneity and an interpretable prediction of individuals at high risk of disease are crucial steps to deliver the promises of precision medicine. In this context, polygenic risk scores (PRS) are likely to play a crucial role in precision medicine for disease-risk prediction. However, it has been argued that PRS might accentuate dispari- ties among non-European ancestries and have low stability at individual-level predictions, probably due to greater underlying complexity in disease etiology that is not captured in a single score. Current efforts to mitigate health disparities involve recruiting individuals from different population ancestries. However, if the underlying biological complexity of disease etiology remains unaccounted, risk stratification methods will continue to be limited. The goal of this project is to develop machine learning methods to advance key computational aspects of precision medicine. In the first aim, an unsupervised method will be applied across large amounts of genetic studies to detect gene sets associated with multiple human traits, which will also identify environmental risk factors. In the second aim, new computational approaches will be developed to learn gene co-expression patterns optimized for a better understanding of transcriptional mechanisms linked to complex traits and their therapeutical modalities. This will detect gene modules (i.e., genes with similar expression profiles across the same cell types) with complex gene relationships, and the approach will be validated by predicting known FDA-approved drug-disease links. Finally, the outcomes of these aims will inform a gene module-based polygenic risk score for accurate and robust disease-risk stratification that will be portable across different population ancestries. Although the methods will be initially applied to asthma, they are clearly extendable to other common diseases as well. For the K99 phase of this project, the mentorship team's expertise covers all key areas of precision medicine, including computational genetics, systems biology, environmental exposure studies, pharmacology, and trans- lational medicine. Mentors and advisors are directly involved in precision medicine initiatives to enhance both scientific discovery and its implementation in clinical care. For the R00 phase and beyond, all the conceptual and methodological expertise previously learned will prepare the applicant for an independent research career in computational methods development applied to precision medicine. The Perelman School of Medicine at the University of Pennsylvania, consistently ranked among the top research medical schools, represents the ideal environment for this highly collaborative project.