Abstract
Genome wide association studies (GWAS) have produced associations between many thousands of
genetic variants and many hundreds of traits. The “functional effects” of most associations, however, have not
yet been elucidated – that is, the causal variants and effector genes responsible for them, and the tissues and
pathways through which they act, remain largely unknown. Over the past few years, three classes of genomic
data have arisen for inferring the functional effects of GWAS associations: summary association statistics
(effect sizes and p-values for associations between SNPs and traits), genomic annotations (assays of
regulatory activity and genomic functional elements), and bioinformatic methods (computationally predicted
functional effects). We argue that two gaps exist in the current resources that aggregate these data: first, no
current resource aims to comprehensively curate and catalog all that is known, and all data or methods that
could help predict, the functional effects of GWAS associations; second, existing resources are developed with
(at best) limited involvement from experts who either originally generated the genomic data and/or understand
how to best use them. We propose to address these gaps by building a new genomic community resource –
the Association to Function Knowledge Portal (A2FKP) – using a general software platform we initially
developed for type 2 diabetes. Our approach makes use of a key innovation to build a resource that is both
high quality and comprehensive: we collaborate with disease expert communities to build dedicated knowledge
portals for them, motivating them to contribute their data and expertise, and we then integrate these data
alongside those of other communities, providing users with access a comprehensive resource.
Specific aim 1 addresses gaps in the comprehensiveness and quality of the data aggregated by
current resources regarding the functional effects of GWAS associations. It will establish and manage
collaborations with a wide range of disease, data, and method experts, and then work with these communities
to identify, aggregate, and curate data for 11 classes of disease. Specific aim 2 addresses gaps in current
schemas and software platforms for the myriad types of data used for predicting the functional effects of
GWAS associations. It will build pipelines for processing genetic and genomic datasets through bioinformatic
methods for predicting the functional effects of GWAS associations, apply these pipelines to data aggregated
in Aim 1, and transform their outputs to relationships among entities in a knowledge graph. The goal of
specific aim 3 is to provide users with direct and visual access to the resources aggregated or computed in
Aims 1 and 2. It will develop REST APIs and web portals for querying and visualizing data within the A2FKP.
Significance: The project would produce a high quality and comprehensive genomic resource of data
and methods for predicting the functional effects of GWAS associations. Easy access to such a resource will
accelerate the pace by which GWAS associations can be translated to insights into complex disease.