Monday, February 16, 2026 2/16/2026

Scalable Computational Methods for Genealogical Inference: from species level to single cells

Award Number: R01HG013117
ORGANIZATION: NATIONAL HUMAN GENOME RESEARCH INSTITUTE
OPDIV: NIH
AWARD CLASS: DISCRETIONARY
AWARD ACTIVITY TYPE: SCIENTIFIC/HEALTH RESEARCH (INCLUDES SURVEYS)
PERIOD OF PERFORMANCE START DATE: 09/10/2024
PERIOD OF PERFORMANCE END DATE: 06/30/2028

Group Awards By:

View Award Description

Scalable Computational Methods for Genealogical Inference: from species level to single cells - PROJECT SUMMARY Massive amounts of genomic data are currently being generated, providing unprecedented opportunities for biomedical researchers to characterize various biological components and processes. In order to utilize these data to make new biological discoveries and improve human health, accurate models and scalable computational tools need to be developed to facilitate analysis and interpretation. The central objective of this project is to address this challenge by developing more realistic probabilistic models, scalable algorithms, and user-friendly software tools to enable the biomedical research community to better harness large genomic data. Many prob- lems in genomics rely on computational methods for inferring genealogical information from large sequence data and interpreting the reconstructed trees. In this application, we propose to make significant strides towards im- proving this line of research by developing a suite of robust and scalable algorithms for probabilistic models of molecular evolution and genealogical inference across multiple timescales. We will achieve our goal by carrying out the following specific aims: 1) A fundamental problem in statistical analysis of molecular evolution is estimat- ing model parameters, for which maximum likelihood estimation (MLE) is typically employed. Unfortunately, MLE is a computationally expensive task, in some cases prohibitively so. In Aim 1, we will tackle this problem by combining a novel MLE framework and modern optimization techniques to develop a broadly applicable computational method that achieves several orders of magnitude speedup in MLE for general models of molecular evolution. The ability to estimate model parameters at unprecedented speed will transform the way that phylogenetic analysis is performed and enable the community to consider more complex, realistic models than previously possible. We will apply our tools to improve phylogenetic inference for two clinically important superfamilies of membrane proteins in humans, namely G protein-coupled receptors and Solute carrier trans- porters. 2) Because of meiotic recombination, the genetic variability within humans cannot be represented by a single tree. Instead, there are millions of different trees across the genome, where each position in the genome will tend to have its own tree that differs only minimally from the trees in nearby sites. The collection of all these trees, and the set of recombination points creating new trees, is represented by the Ancestral Recombination Graph (ARG), which has a number of applications in human genetics. Despite substantial recent progress on reconstructing ARGs, however, current methods are either too slow to scale up to large data sets, or they do not sample ARGs accurately from the correct posterior distribution. In Aim 2, we will develop a new computational method to improve ARG sampling. We will test the method extensively on simulated data, develop a number of applications, and generate genome-wide ARGs for several human data sets to facilitate biological discoveries. 3) Applications of genealogical inference methods have been rapidly growing in single-cell genomics. In particular, advances in CRISPR/Cas9 genome editing technologies have enabled lineage tracing for thousands of cells in vivo, and the problem of reconstructing trees from such data has received considerable attention recently. In Aim 3, we will develop scalable algorithms to reconstruct time-resolved single-cell trees for thousands of cells sampled at multiple time points. We will also develop a novel statistical method grounded in rigorous the- ory to improve tree-based fitness inference. We will apply the methods developed here to study cancer evolution as well as B cell affinity maturation in germinal centers.


Issue Date FY	Funding FY	Legal Entity Name	Legal Entity Address	Legal Entity City	Legal Entity State	Legal Entity Zip Code	Legal Entity COUNTY	Legal Entity COUNTRY	Assistance Listing	Award Code	Budget Year	Action Date	Action Type	Action Amount

Issue Date FY: 2025 ( Subtotal = $575,594 )
2025	2025	REGENTS OF THE UNIVERSITY OF CALIFORNIA, THE	1608 4TH ST STE 201	BERKELEY	CA	94710	ALAMEDA	USA	Human Genome Research	000	2	7/21/2025	NON-COMPETING CONTINUATION	$575,594
														Subtotal = $575,594

Issue Date FY: 2024 ( Subtotal = $592,707 )
2024	2024	REGENTS OF THE UNIVERSITY OF CALIFORNIA, THE	1608 4TH ST STE 201	BERKELEY	CA	94710	ALAMEDA	USA	Human Genome Research	000	1	9/10/2024	NEW	$592,707
														Subtotal = $592,707

Grand Total All Awards = $1,168,301

Top

All Categories

About

Search

Reports

Data Submission

Award Information

Scalable Computational Methods for Genealogical Inference: from species level to single cells

Award Number: R01HG013117

ORGANIZATION: NATIONAL HUMAN GENOME RESEARCH INSTITUTE

OPDIV: NIH

AWARD CLASS: DISCRETIONARY

AWARD ACTIVITY TYPE: SCIENTIFIC/HEALTH RESEARCH (INCLUDES SURVEYS)

PERIOD OF PERFORMANCE START DATE: 09/10/2024

PERIOD OF PERFORMANCE END DATE: 06/30/2028

Federal Websites

Department of Health & Human Services

HHS Operating Divisions

HHS Staff Divisions

Download A Document Viewer