Characterizing pervasive biases in genome-wide association study using family health history as proxy phenotypes - PROJECT SUMMARY
Genome-wide association studies (GWAS) have identified numerous genetic loci associated with almost all
complex human diseases. Much of this success, particularly the accelerated findings in recent years, is credited
to the development of deeply phenotyped population biobanks with matched genomic data. However, a crucial
limitation of these population biobanks is the often-insufficient number of disease cases for late-life health
outcomes, which is why the introduction of the concept of GWAS-by-proxy (GWAX) served as a landmark in the
field. The GWAX study design is based on a simple idea – although biobank participants may not have their own
diagnosis on late-life disease outcomes, they provide such diagnosis of their parents through the family health
history survey; they also (indirectly) provide parental genetic data, as their biological child. Since this study,
GWAX has been widely used in genetic studies for many diseases, but particularly frequently for
neurodegenerative diseases. Every recent Alzheimer’s disease (AD) GWAS performed meta-analysis to
combined case-control associations with GWAX proxy associations to boost sample size and statistical power.
However, methodological issues in GWAX and the quality of its association results have not been carefully
investigated. We demonstrate pervasive biases in current GWAX approaches, causing substantial divergence
of GWAS and GWAX results. In addition, we demonstrate that education is an important social factor at the
center of many of these biases. Since cognition is such a crucial marker for AD, biases caused by
education/cognition become particularly important in AD genetics research and will give completely misleading
results if not handled properly. Our proposal takes advantage of extensive family health history data available in
the AllofUs research program and recent statistical advances developed by our investigator team in
decomposing social genetic effects with summary statistics of multi-generational GWAS. We aim to expand
these methods to rigorously and comprehensively characterize the biases in current GWAX results. Our central
hypothesis is that GWAX associations based on family health history as proxy for disease phenotypes
are substantially affected by survival bias and non-random over- and under-report of family member’s
illness, and will lead to erroneous results and conclusions for analyses that naively combine these
associations with case-control GWAS results. Successful completion of this proposal will improve scientific
understanding of the genetic underpinnings of family health history, shed important light on the design and
analysis of mid-aged biobank cohorts, and provide novel analytical strategies for future genetic studies
leveraging family health history data in population biobanks.