Tuesday, November 4, 2025 11/4/2025

Sequence transformations and microbiology: theory, tools, & discovery

Award Number: R35GM160134
ORGANIZATION: NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES
OPDIV: NIH
AWARD CLASS: DISCRETIONARY
AWARD ACTIVITY TYPE: SCIENTIFIC/HEALTH RESEARCH (INCLUDES SURVEYS)
PERIOD OF PERFORMANCE START DATE: 09/01/2025
PERIOD OF PERFORMANCE END DATE: 08/31/2030

Group Awards By:

View Award Description

Sequence transformations and microbiology: theory, tools, & discovery - Project Summary String similarity is one of the foundational problems in computational biology – it is what allows piecing together a genome from short sequenced read substrings, determining that human genomes share a more recent common ancestor with chimpanzee than they do with octopus, and detecting the horizontal transfer of genetic material across bacterial species, one of our current application focuses. Exact sequence matches between genomes is a long-solved problem, but in biology, due to the prevalence of both mutation and sequencing error, we care more about approximate matches, which are much harder to find and characterize. One of the major recent advances in bioinformatics has been the advent of increasingly sophisticated string transformations (sketching, k-mer-ization, alphabet reductions) that change the distribution of exact matches on the transformed strings, allowing for the development of faster software. My nascent research lab has been one of the pioneers in both developing rigorous mathematical theory to understand sequence transformations and in engineering software that turns that theory into usable bioinformatics software. Relevantly, in prior work, we gave the first rigorous proofs that k-mer sketching works with alignment [Shaw & Yu, Genome Research, 2023] successfully translated our theoretical understanding of k-mer sketching theory into a new metagenomics software “skani” [Shaw & Yu, Nature Methods, 2023] for computing pairwise average nucleotide identity (ANI), a standard measure of genome similarity. Skani is both more accurate and orders of magnitude (20x) faster than the state-of-the-art. Building on skani, we further produced skandiver [Zhang et al., Bioinformatics, 2024], a tool for detecting large intercellular mobile genetic elements by comparing all the chunks of a sequence against all the whole genomes in a database, without needing a reference database of mobile genetic elements specifically. Over the next five years, my research program will continue to straddle the line between advancing string algorithm theory and using that to build bioinformatics tools. On the theory side, we want to combine ideas from k-mer sketching theory with alphabet reductions to expand the utility of sketching for more dissimilar sequences (such as found in protein databases). On the applied side, we are going to push forward string similarity tools for microbial analysis, focusing on better characterizing mobile genetic elements – our proof-of-concept skandiver only does long-all-to-all-sequence similarity, and its design principles don’t work for intra-species MGEs, small MGEs, or even annotate the boundaries of the MGEs it does find.


Issue Date FY	Funding FY	Legal Entity Name	Legal Entity Address	Legal Entity City	Legal Entity State	Legal Entity Zip Code	Legal Entity COUNTY	Legal Entity COUNTRY	Assistance Listing	Award Code	Budget Year	Action Date	Action Type	Action Amount

Issue Date FY: 2025 ( Subtotal = $412,295 )
2025	2025	CARNEGIE MELLON UNIVERSITY	5000 FORBES AVE	PITTSBURGH	PA	15213	ALLEGHENY	USA	Biomedical Research and Research Training	000	1	9/1/2025	NEW	$412,295
														Subtotal = $412,295

Grand Total All Awards = $412,295

Top

All Categories

About

Search

Reports

Data Submission

Award Information

Sequence transformations and microbiology: theory, tools, & discovery

Award Number: R35GM160134

ORGANIZATION: NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES

OPDIV: NIH

AWARD CLASS: DISCRETIONARY

AWARD ACTIVITY TYPE: SCIENTIFIC/HEALTH RESEARCH (INCLUDES SURVEYS)

PERIOD OF PERFORMANCE START DATE: 09/01/2025

PERIOD OF PERFORMANCE END DATE: 08/31/2030

Federal Websites

Department of Health & Human Services

HHS Operating Divisions

HHS Staff Divisions

Download A Document Viewer