PROJECT SUMMARY
The Genome Wide Association Studies (GWAS) Catalog’s mission is to provide a comprehensive and
complete resource of GWAS knowledge and to integrate the Catalog with appropriate resources, including
those that translate GWAS knowledge to improve human health and improve our understanding of human
variation in the context of complex disease and related traits. Over the next five years we will continue to
provide the most complete, curated, standardised and FAIR resource of GWAS data for an international user
community of biomedical researchers from academic and pharmaceutical companies. We will extend our
resource activities to closely link the GWAS Catalog with a major cognate application, that of Polygenic
Scores (PGS) and the Polygenic Score Catalog. We will continue to work with journals, consortia, charities
and other funders to ensure that data is accessible, federating elements of the data where it cannot be shared
due to ethical constraints. We will improve the data ingest, curation, visualisation and API components to
ensure we scale to increasing data and user volumes. Automation of curation, user deposition and literature
extraction will be automated and enhanced resulting in quality controlled, harmonised and FAIR knowledge
for users. By integrating data flows with PGS and Mendelian Randomisation (MR) resources, we will make
the data and necessary meta data readily accessible for analysis for a wider group of users and reduce
redundancy in data flow and acquisition across resources, consolidating our resource as the world’s primary
GWAS knowledge base. In Aim 1, we will deliver novel processes and support QC for author deposition of
significant SNP-Trait associations enabling scaling and leveraging existing author relationships. Our work to
acquire the community’s invaluable GWAS summary statistics will continue, with a target of 75% of all studies
linked to summary statistics, emphasising non-European ancestries and under-represented disease areas.
Aim 2 provides improvements for community uses of summary statistics by integrating data flows with PGS
and Mendelian Randomisation (MR) resources. Aim 3 addresses performance improvements for the
infrastructure ensuring it is portable and modular and enabling sharing of QC and harmonisation processes.
Aim 4 improves our graphical user interfaces, visualisation and data exploration tools and APIs, ensuring
they scale for unprecedented data volumes and are appropriate for evolving user needs. Together these
aims will serve our growing user community to both enable and enhance the aetiological understanding,
prevention and treatment of cardiovascular disease, diabetes, cancers, psychiatric disorders and other
complex diseases.