Abstract
Large-scale genome wide association studies (GWAS) have identified a large number of genetic variants
associated with complex diseases. The aggregation of all the variants that are known to contribute to the disease
in the form of polygenic risk scores (PRS) improves the prediction of a range of complex diseases. Most PRS
have been developed within European ancestry study samples and have shown to perform poorly in other
race/ethnic groups, further exaggerating health disparities across ancestries. As genetic approaches for
precision medicine become more popular, there is a critical need to responsively and pro-actively expand access
to accurate PRS. Specifically, diabetes, and its associated complications are one of the biggest global health
problems of the 21st century. In fact, type 1 and type 2 diabetes (T1D and T2D), gestational diabetes (GDM) and
related complications are excellent disease models to study the utility of PRS for predicting heterogenous and
complex health outcomes in a setting where dramatic racial/ethnic and socioeconomic disparities exist. Not only
are PRS useful to predict T1D and T2D, but they can distinguish between T1D and T2D, and between T2D
subtypes. The wealth of existing trans-ancestry GWAS data from diabetes subtypes, complications, and
quantitative traits recently generated provides a unique opportunity for constructing highly transferable PRS
across populations. To address the disparities in PRS across ancestries, we have assembled a multi-disciplinary
team to aggregate and analyze the largest existing genetic data from more than 1.8 M individuals (35% non-
European) with T1D, T2D, GDM and glycemia-related complications and quantitative traits to improve the PRS
prediction of diabetes and progression across lifespan in diverse ancestries with these Aims: (1) Collection,
harmonization and integration of large-scale, multi-ancestry cohorts with diabetes traits across the life-span and
genomics for development, training and testing PRS for diverse ancestries; (2) Development of methods to
improve PRS prediction in non-European populations by using Bayesian approaches that allow integration of
linkage disequilibrium and summary statistics from several ancestries. (3) Development, testing, and comparing
performance of PRS for each trait, development of risk prediction tools that integrate clinical and genetic risk
factors, and assessment of scenarios where PRS improve the prediction. Accomplishing the aims of this proposal
will demonstrate how genomic data can inform more efficient and targeted preventive strategies within healthcare
systems and across ethnically diverse populations. Findings are expected to advance precision care of patients
with diabetes and related conditions in people of diverse ancestral background and serve as a paradigm for
many other complex diseases.