Improving PGS Prediction for Underrepresented Groups Through Transfer Learning - In the last two decades, thousands of Genome-Wide Association Studies (GWAS) have been published. Increasingly, the findings reported by these studies inform the development of Polygenic Scores (PGS) that can be used to predict phenotypes and disease risk. The Polygenic Scores Catalog includes more than 3,700 PGS. However, the overwhelming majority of the PGS were derived using data from Europeans and have poor predictive performance when used to predict phenotypes of individuals of non-European ancestry. Transfer Learning (TL) is a technique by which knowledge gained in one data set is used to improve the model’s performance in another data set. Our overarching goal is to develop novel TL algorithms to improve the prediction accuracy of PGS for ancestry groups underrepresented in genomics research. To achieve this, we propose three specific aims. Our first Aim is to Develop and Benchmark Novel Penalized and Bayesian methods for PGS development using Transfer Learning. The first method that we proposed is a Penalized Regression inducing shrinkage of estimates towards external estimates of SNP effects (e.g., SNP effects derived from Europeans). We develop coordinate descent algorithms to fit Penalized Regressions using Ridge, Lasso, and Elastic Net penalties. The second model we propose is a Bayesian Regression with a mixture prior that uses external estimates as prior means in a model that can automatically learn for each SNP whether to transfer knowledge from the exterior estimator or not and the strength of borrowing information. We present preliminary results (using data from the UK-Biobank and AoU) that demonstrate the potential of the proposed methods. Our Aim 1 research will deliver efficient open-source software to develop PGS using TL and extensive benchmarks using data from the UK-Biobank, the All of Us (AoU), and three US cohorts (the ARIC, REGARDS, and the HCSL/SOL cohorts). Recent studies suggest that a sizable fraction of the loss of accuracy (LOA) in cross-ancestry prediction is attributable to genome differentiation (i.e., between ancestries differences in allele frequencies and Linkage Disequilibrium). We hypothesize that genome differentiation (and thus the portability of local PGS) varies substantially over the genome. Therefore, in Aim 2, we propose to Develop and Validate Maps of the Relative Accuracy (RA) of European-derived PGS when used to predict phenotypes of African Americans and Latinos. Finally, Aim 3 focuses on Integrating Relative Accuracy Maps developed in Aim 2 into Transfer Learning Algorithms that can achieve strong transferring of knowledge for genomic regions that exhibit limited genome differentiation between populations (i.e., high predicted RA) and weaker TL for regions with low predicted RA. We propose strategies for this in Penalized and the Bayesian models developed in Aim 1. Further, we also offer an approach to use the Bayesian model that we will develop in Aim 3 to leverage sex-by-ancestry differences.