Sunday, July 6, 2025 7/6/2025

Scaling up computational genomics with tree sequences

Award Number: R01HG012473
ORGANIZATION: NATIONAL HUMAN GENOME RESEARCH INSTITUTE
OPDIV: NIH
AWARD CLASS: DISCRETIONARY
AWARD ACTIVITY TYPE: SCIENTIFIC/HEALTH RESEARCH (INCLUDES SURVEYS)
PERIOD OF PERFORMANCE START DATE: 06/05/2023
PERIOD OF PERFORMANCE END DATE: 03/31/2027

Group Awards By:

View Award Description

Scaling up computational genomics with tree sequences - Project Summary/Abstract Increasing sample size is a tremendously important factor in building our understanding of the genetics of human disease. As we discover that more and more diseases have a complex web of genetic causation, we need larger and larger genetic datasets to disentangle them, and to ultimately produce successful therapies. Driven in part by this need, the community is now assembling vast collections of human genome sequences, and millions of samples will soon be commonplace. There is a profound problem, however: our computational methods for storing, processing, and analyzing genomic data are lagging far behind. The algorithms and data structures underlying today’s computational methods were designed for thousands of samples, not millions. Without fundamental change in how we store and process genomic data, we will either not fully tap the potential of the data we collect, or the computational costs will be astronomical – or both. Nonhuman datasets, with applications in epidemiology, ecology, evolution, and agriculture, may not reach these sample sizes soon, but here we nevertheless face a related barrier. Simulation is increasingly important for tasks from hypothesis generation to parameter inference. However, current simulation methods only scale to tens or hundreds of thousands of individuals, inappropriate for many species of interest (e.g., mosquitos). This is crucial, since evolution and ecology in large populations differs from small ones, in ways that cannot be avoided by mathematical tricks (like rescaling). Our proposal addresses these critical needs by focusing on a new data structure: the “tree sequence”, which encodes genetic variation data using the population genetics processes that produced the data itself, by representing variation among contemporary samples using the underlying genealogical trees. This yields extraordinary levels of data compression, with file sizes hundreds of times smaller than current community standards. Since the tree sequence was introduced in 2016 it has led to performance increases of 2–4 orders of magnitude in genome simulation, calculation of statistics, and ancestry inference. Such sudden leaps in computational performance are vanishingly rare, and only possible through deep algorithmic advances. Our research plan builds on the extraordinary successes of tree sequence methods so far, scaling up three crucial layers of computational genomics: analysis, simulation, and inference. First, we will continue our development of highly efficient tree-sequence-based methods for fundamental operations in statistical and population genetics. Second, we will scale up genome simulations by integrating tree sequence methods into complex forward-time simulations and utilizing modern, multicore processors. Third, we will combine efficient simulations and the rich information contained in the tree sequence with cutting-edge deep-learning techniques to develop new inference methods. Together, we aim to revolutionize the way we work with and learn from population genetic variation data.


Issue Date FY	Funding FY	Legal Entity Name	Legal Entity Address	Legal Entity City	Legal Entity State	Legal Entity Zip Code	Legal Entity COUNTY	Legal Entity COUNTRY	Assistance Listing	Award Code	Budget Year	Action Date	Action Type	Action Amount

Issue Date FY: 2025 ( Subtotal = $527,634 )
2025	2025	UNIVERSITY OF OREGON	1776 E 13TH AVE	EUGENE	OR	97403	LANE	USA	Human Genome Research	000	3	4/9/2025	NON-COMPETING CONTINUATION	$527,634
														Subtotal = $527,634

Issue Date FY: 2024 ( Subtotal = $596,484 )
2024	2024	UNIVERSITY OF OREGON	1776 E 13TH AVE	EUGENE	OR	97403	LANE	USA	Human Genome Research	000	2	3/20/2024	NON-COMPETING CONTINUATION	$547,793
2024	2024	UNIVERSITY OF OREGON	1776 E 13TH AVE	EUGENE	OR	97403	LANE	USA	Human Genome Research	001	2	5/29/2024	NON-COMPETING CONTINUATION	$48,691
														Subtotal = $596,484

Issue Date FY: 2023 ( Subtotal = $605,701 )
2023	2023	UNIVERSITY OF OREGON	1585 E 13TH AVE	EUGENE	OR	97403	LANE	USA	Human Genome Research	000	1	6/5/2023	NEW	$605,701
														Subtotal = $605,701

Grand Total All Awards = $1,729,819

Top

All Categories

About

Search

Reports

Data Submission

Award Information

Scaling up computational genomics with tree sequences

Award Number: R01HG012473

ORGANIZATION: NATIONAL HUMAN GENOME RESEARCH INSTITUTE

OPDIV: NIH

AWARD CLASS: DISCRETIONARY

AWARD ACTIVITY TYPE: SCIENTIFIC/HEALTH RESEARCH (INCLUDES SURVEYS)

PERIOD OF PERFORMANCE START DATE: 06/05/2023

PERIOD OF PERFORMANCE END DATE: 03/31/2027

Federal Websites

Department of Health & Human Services

HHS Operating Divisions

HHS Staff Divisions

Download A Document Viewer