Wednesday, December 31, 2025 12/31/2025

Integrating the reference pangenome with biobank-scale data for complex trait analysis

Award Number: U01HG013755
ORGANIZATION: NATIONAL HUMAN GENOME RESEARCH INSTITUTE
OPDIV: NIH
AWARD CLASS: COOPERATIVE AGREEMENT
AWARD ACTIVITY TYPE: SCIENTIFIC/HEALTH RESEARCH (INCLUDES SURVEYS)
PERIOD OF PERFORMANCE START DATE: 09/20/2024
PERIOD OF PERFORMANCE END DATE: 08/31/2027

Group Awards By:

View Award Description

Integrating the reference pangenome with biobank-scale data for complex trait analysis - Project summary The human reference pangenome, which represents a collection of genome sequences in a single data structure, has the potential to transform human genetics applications. Compared to a traditional linear reference genome, pangenomes enable analysis of megabases of genetic sequence that were previously ignored, reduce bias when analyzing diverse genomes, and provide dramatically improved genotyping of structurally complex regions of the genome. These complex regions likely harbor medically relevant variants contributing to a range of human traits. However, pangenomes have yet to be integrated into medical genetics and complex trait workflows due to a lack of analysis and visualization tools that are accessible to non-experts. Our central hypothesis is that pangenomes can be used to improve fine-mapping of trait associations and detection of pathogenic variants in complex regions by identifying particular paths enriched in individuals with a phenotype of interest. We focus on developing and applying tools that leverage pangenomes to identify, visualize, and fine-map genomic loci associated with complex traits. The tools proposed below are motivated by two major challenges identified by our own efforts to this end. First, visualization and browsing pangenome subgraphs for loci of interest, which is a critical step in exploring and understanding complex genomic regions, is currently a cumbersome and time-consuming process involving multiple command line tools geared at bioinformatics experts. Second, there is a lack of tools for integrating existing biobank datasets for which both genotype and phenotype data are available for complex traits analysis, with the reference pangenome. Our proposal integrates multiple large datasets encompassing a range of technologies and builds on existing pangenome resources and the computational infrastructure developed by the HPRC. In particular, we use genotype data and whole genome sequencing (WGS) datasets available for hundreds of thousands of individuals of a range of ancestries from the UKBiobank and All of Us as well as thousands of phenotypes available for these samples. A key goal is to enable backwards compatibility with existing biobank-scale datasets that have been mapped to linear reference genomes, which will facilitate more immediate use of the pangenome reference. We additionally use near complete long read assemblies and the reference pangenomes (primarily minigraph-cactus) released by HPRC. Further, our tools are designed to integrate with the current pangenome computational ecosystem by incorporating existing file formats (e.g. rGFA) and toolkits (e.g. vg). To this end we will develop a web-based pangenome browser that integrates with existing data based on linear genomes (Aim 1), develop metrics to quantify local graph complexity and use these metrics to characterize existing GWAS signals (Aim 2), and integrate pangenomes with existing biobank datasets to perform fine-mapping and visualization of individual trait-associated loci (Aim 3).


Issue Date FY	Funding FY	Legal Entity Name	Legal Entity Address	Legal Entity City	Legal Entity State	Legal Entity Zip Code	Legal Entity COUNTY	Legal Entity COUNTRY	Assistance Listing	Award Code	Budget Year	Action Date	Action Type	Action Amount

Issue Date FY: 2025 ( Subtotal = -$4,889 )
2025	2024	UNIVERSITY OF CALIFORNIA, SAN DIEGO	9500 GILMAN DR	LA JOLLA	CA	92093	SAN DIEGO	USA	Human Genome Research	000	1	10/18/2024	NEW	-$4,889
														Subtotal = -$4,889

Issue Date FY: 2024 ( Subtotal = $1,315,688 )
2024	2024	UNIVERSITY OF CALIFORNIA, SAN DIEGO	9500 GILMAN DR	LA JOLLA	CA	92093	SAN DIEGO	USA	Human Genome Research	000	1	9/20/2024	NEW	$1,315,688
														Subtotal = $1,315,688

Grand Total All Awards = $1,310,799

Top

All Categories

About

Search

Reports

Data Submission

Award Information

Integrating the reference pangenome with biobank-scale data for complex trait analysis

Award Number: U01HG013755

ORGANIZATION: NATIONAL HUMAN GENOME RESEARCH INSTITUTE

OPDIV: NIH

AWARD CLASS: COOPERATIVE AGREEMENT

AWARD ACTIVITY TYPE: SCIENTIFIC/HEALTH RESEARCH (INCLUDES SURVEYS)

PERIOD OF PERFORMANCE START DATE: 09/20/2024

PERIOD OF PERFORMANCE END DATE: 08/31/2027

Federal Websites

Department of Health & Human Services

HHS Operating Divisions

HHS Staff Divisions

Download A Document Viewer