Pleiotropy (i.e., variants that confer risk to multiple characters) leads to the genetic correlation between traits and
underlies the development of many syndromes. The identification of variants with pleiotropic effects on health-
related traits can improve the biological understanding of gene action and disease etiology, and can help to advance
disease-risk prediction.
However, mapping pleiotropic risk loci is statistically and computationally challenging. Schaid et al. (Genetics, 2016)
proposed an intersection-union sequential test that addresses the statistical challenges emerging in multi-trait
genome-wide association analyses. Schaid’s sequential Likelihood Ratio Test (sLRT) is powerful, provides adequate
error control, and leads to easy-to-interpret results. However, the adoption of the methodology remains limited
because the proposed test and the existing software do not scale to big data (hundreds of thousands of individuals,
millions of SNPs, many traits). Therefore, we propose to develop an alternative to the sLRT that achieves the same
power but involves computations that scale to big data.
Our approach adopts the intersection-union sequential testing framework but uses a Wald test and an approximation
that substantially reduces the computational burden. Preliminary results presented in this grant show that the
proposed test, and the beta C++ implementation we developed, has the power and error-control performance of the
sLRT, it is considerably faster (by a factor of about 300), and scales to big data.
In this project, we will (Aim 1) conduct extensive simulations to assess the statistical properties of the proposed test.
(Aim 2) We will integrate memory mapping with optimized in-memory computations to develop open-source
software that will implement the proposed test within the R environment, in a software package that will scale to
big-data analysis. (Aim 3) Finally, we will use the methods and software developed in Aim 3, together with data from
the UK-Biobank, to study the genetic underpinnings of Metabolic Syndrome.
The advent of biobank data has opened unprecedented opportunities for mapping genetic loci affecting complex
biological networks. However, more efficient data analysis tools are needed to unleash the potential of modern
biobanks. This proposal will: (i) Develop novel methods for mapping risk loci affecting systems of traits. (ii) Develop
and share with the research community software that can be used to analyze multidimensional phenotypes with big
data. (iii) Advance knowledge of the genetic basis of Metabolic Syndrome.