Thursday, March 19, 2026 3/19/2026

K-mer indexing for pan-genome reference annotation

Award Number: U01HG010963
ORGANIZATION: NATIONAL HUMAN GENOME RESEARCH INSTITUTE
OPDIV: NIH
AWARD CLASS: COOPERATIVE AGREEMENT
AWARD ACTIVITY TYPE: SCIENTIFIC/HEALTH RESEARCH (INCLUDES SURVEYS)
PERIOD OF PERFORMANCE START DATE: 02/01/2020
PERIOD OF PERFORMANCE END DATE: 01/31/2024

Group Awards By:

View Award Description

K-mer indexing for pan-genome reference annotation - ABSTRACT The human genome reference sequence is one of the foundations of genome sciences, especially in the context of next-generation sequencing (NGS) analysis. The reference has enabled discoveries in biomedical research and been particularly instrumental in human disease gene identification. However, the human genome reference is limited by its static and linear nature. Specifically, the current reference lacks the featural and contextual flexibility to represent the breadth of human variation. Important elements of individual genomes are either missed or incorrectly represented. As a solution that will bridge the next generation of reference assemblies with population genome sequencing studies, we have developed a K-mer-based indexing approach. This method is more efficient computationally, provides accurate representation in the context of populations and facilitates the analysis of diverse human genomes. Our goal is to use this strategy in developing a robust computational architecture that will encode and annotate large collections of genomes in the context of a pan-genome reference. First, we plan to develop a scalable, efficient K-mer representation of a large collection of haplotype/phased reference genomes, by 1) generating an index of all K-mers in human reference genome GRCh38 in a manner that can efficiently store variant information as metadata, and then 2) incrementally updating the K-mer index to include all novel K-mers derived from ongoing population sequencing efforts, while 3) developing schemes for directly analyzing compressed genomic data. Second, we plan to apply K-mer representation to genomic analysis by 1) providing the entirety of known human genetic variation in an aggregated index that is computationally efficient and easy to understand, 2) developing functions for our pan-genomic index that supports ultra-rapid queries, such as of clinically important variants, and 3) linking conventional coordinate information to the K-mer metadata in the pan-genome index to allow annotating genetic variation to a particular genome reference. Third, we will create an online web portal for the pan-genome, using cloud computing, to maximize the utility of our approach, to promote community engagement and to enabling contribution from the research community. We expect that completion of these aims will provide: a scalable computational architecture which incorporates the continuous addition of variant information without loss of resolution or accuracy;; rapid query speeds that will remain nearly constant as the database grows;; a universally accessible portal using cloud computing. This work will help solve the issues of multiple assemblies. It will improve researchers’ ability to understand the relationship of variants and disease, while also providing great savings over the long-term in infrastructure and computing costs.


Issue Date FY	Funding FY	Legal Entity Name	Legal Entity Address	Legal Entity City	Legal Entity State	Legal Entity Zip Code	Legal Entity COUNTY	Legal Entity COUNTRY	Assistance Listing	Award Code	Budget Year	Action Date	Action Type	Action Amount

Issue Date FY: 2024 ( Subtotal = $0 )
2024	2022	THE LELAND STANFORD JUNIOR UNIVERSITY	450 JANE STANFORD WAY	STANFORD	CA	94305	SANTA CLARA	USA	Human Genome Research	000	3	5/3/2024	NON-COMPETING CONTINUATION	$0
														Subtotal = $0

Issue Date FY: 2023 ( Subtotal = $300,000 )
2023	2023	THE LELAND STANFORD JUNIOR UNIVERSITY	450 JANE STANFORD WAY	STANFORD	CA	94305	SANTA CLARA	USA	Human Genome Research	001	3	7/3/2023	SUPPLEMENT FOR EXPANSION	$300,000
2023	2022	THE LELAND STANFORD JUNIOR UNIVERSITY	450 JANE STANFORD WAY	STANFORD	CA	94305	SANTA CLARA	USA	Human Genome Research	000	3	2/21/2023	NON-COMPETING CONTINUATION	$0
														Subtotal = $300,000

Issue Date FY: 2022 ( Subtotal = $300,000 )
2022	2022	LELAND STANFORD JUNIOR UNIVERSITY, THE	450 JANE STANFORD WAY	STANFORD	CA	94305	SANTA CLARA	USA	Human Genome Research	001	3	4/27/2022	NON-COMPETING CONTINUATION	$29,999
2022	2022	LELAND STANFORD JUNIOR UNIVERSITY, THE	450 JANE STANFORD WAY	STANFORD	CA	94305	SANTA CLARA	USA	Human Genome Research	000	3	1/20/2022	NON-COMPETING CONTINUATION	$270,001
														Subtotal = $300,000

Issue Date FY: 2021 ( Subtotal = $300,000 )
2021	2021	LELAND STANFORD JUNIOR UNIVERSITY, THE	450 JANE STANFORD WAY	STANFORD	CA	94305	SANTA CLARA	USA	Human Genome Research	000	2	1/12/2021	NON-COMPETING CONTINUATION	$269,999
2021	2021	LELAND STANFORD JUNIOR UNIVERSITY, THE	450 JANE STANFORD WAY	STANFORD	CA	94305	SANTA CLARA	USA	Human Genome Research	001	2	2/4/2021	NON-COMPETING CONTINUATION	$30,001
														Subtotal = $300,000

Issue Date FY: 2020 ( Subtotal = $376,091 )
2020	2020	LELAND STANFORD JUNIOR UNIVERSITY, THE	450 JANE STANFORD WAY	STANFORD	CA	94305	SANTA CLARA	USA	Human Genome Research	000	1	1/31/2020	NEW	$376,091
														Subtotal = $376,091

Grand Total All Awards = $1,276,091

Top

All Categories

About

Search

Reports

Data Submission

Award Information

K-mer indexing for pan-genome reference annotation

Award Number: U01HG010963

ORGANIZATION: NATIONAL HUMAN GENOME RESEARCH INSTITUTE

OPDIV: NIH

AWARD CLASS: COOPERATIVE AGREEMENT

AWARD ACTIVITY TYPE: SCIENTIFIC/HEALTH RESEARCH (INCLUDES SURVEYS)

PERIOD OF PERFORMANCE START DATE: 02/01/2020

PERIOD OF PERFORMANCE END DATE: 01/31/2024

Federal Websites

Department of Health & Human Services

HHS Operating Divisions

HHS Staff Divisions

Download A Document Viewer