Statistical methods and tools for enhancing polygenic risk prediction and discovery of causal gene pathways - Polygenic risk prediction and prioritization of disease-causal genes are two fundamental tasks in human genetic research. While polygenic risk scores (PRS) are receiving increasing attention for their high potential value in disease risk stratification and precision medicine, some important but understudied tasks in the field need further investigation. For example, there is a lack of well-suited methods for developing PRS for heterogeneous populations. Additionally, while SNP effects can vary by risk group potentially due to gene-gene and gene-environment interactions, current PRS do not capture such hidden variations that can be crucial for prioritizing high-risk groups. Furthermore, the rapidly evolving methods and growing volume of GWAS data pose challenges for hosting local PRS pipelines, highlighting the growing demand for a centralized cloud computing tool to facilitate PRS training practices. On the other hand, while identifying disease-causal gene pathways beyond single-gene level has been broadly recognized as critical for revolutionizing disease etiology and targeted therapy, the field is methodologically under-investigated. Suffering from low power and limited gene expression data sources, current pathway analyses have not utilized some important resources, such as the rich gene regulatory network (GRN) information from expansive pathway databases and emerging multi-omic data. The proposed research program aims to bridge these gaps with innovative solutions. We will (1) develop a novel method for building PRS for heterogeneous and admixed populations incorporating both continuous global and local ancestry heterogeneity in SNP effects; (2) introduce “quantile PRS”, which could better approximate the true PRS by capturing heterogeneous SNP effects across different phenotype quantiles and perform refined prediction/stratification with a PRS curve; and (3) launch the first public PRS cloud computing server for the research community. We will also (4) propose a rigorous, GRN-informed TWAS framework for advanced pathway discovery and (5) incorporate multi-omic data to further improve the power of the framework. Our methods will be evaluated across a wide range of traits and diseases with data acquired from previous work or through current collaborations. Collaborating with domain experts, we will additionally utilize the pathway discovery methods to study the genomic etiology of AD, cancers, and psychiatric disorders. Overall, we expect that the proposed statistical methods and user-friendly tools can be broadly applied to improve the power and applicability of polygenic risk prediction models and facilitate the discovery of gene pathways for complex human diseases. In the last 5 years, I have built a broad research profile in polygenic risk prediction, causal inference in genetic studies, and gene pathway analysis, and have mentored many undergraduate and graduate students. My multi-disciplinary research background has put me in a unique position to lead the proposed PRS research while adapting to the important field of unraveling genetic mechanisms of complex traits and diseases.