Tuesday, August 12, 2025 8/12/2025

Scalable post-assembly editing software for finishing and annotating personal genomes

Award Number: R44GM128518
ORGANIZATION: NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES
OPDIV: NIH
AWARD CLASS: DISCRETIONARY
AWARD ACTIVITY TYPE: SCIENTIFIC/HEALTH RESEARCH (INCLUDES SURVEYS)
PERIOD OF PERFORMANCE START DATE: 09/01/2018
PERIOD OF PERFORMANCE END DATE: 02/28/2022

Group Awards By:

View Award Description

Scalable post-assembly editing software for finishing and annotating personal genomes - We are entering a new era of personal genomics where an individual's genome sequence will be used to identify disease susceptibility, improve diagnosis and better treat illnesses as well as be combined across cohorts and populations to identify new biomarkers and causal mutations underlying any phenotype. Despite the tremendous success of mapping short read next-generation sequencing (NGS) data onto a reference genome (resequencing) in identifying genetic variation in a new genome, the inherent lack of long range connectivity together with reference-induced biases make obtaining complete haplotype-phased genomes exceedingly difficult. Emerging long read technologies are beginning to address this critical shortcoming by direct de novo assembly of an individual's genome. However, initial de novo assemblies typically consist of many thousands of unordered contigs that require extensive post-assembly processing to produce finished sequences that can be effectively mined for genetic content and variation. Thus, there is an urgent need for integrated, scalable post-assembly software that 1) automatically organizes, joins and phases the initial contigs into complete haplotype sequences, 2) supports optional NGS and/or manual polishing and 3) provides initial automated annotation of those sequences. Currently, such software does not exist and instead users must cobble together a confusing array of difficult-to-use, task-specific pieces of open source programs. DNASTAR's post-assembly editing program, SeqMan Pro (SMP), has a proven history in finishing bacterial sized genomes although it currently lacks the scalability and all the needed functionality to tackle human genome sized problems. The primary goal of this Fast Track proposal is to create a fully scalable version of SMP for the automated finishing and annotation of de novo assembled large eukaryotic genomes while also providing a manual editing platform when needed. During Phase I, we will develop two key prototypes: 1) a new assembly file format, eBAM, which is interconvertible with the BAM format, but also is editable like our SQD files and 2) a rapid reference-assisted contig scaffolding tool adapted from our proprietary Disk Sort Alignment (DSA) algorithm. With that foundation, we will complete the transformation of SMP in Phase II by: 1) refining the eBAM format for optimal editing performance, 2) building a new 64-bit version of the SMP editing engine that incorporates the additional functionality necessary for post-assembly finishing of large eukaryotic genomes including automated DSA-based scaffolding and phase-aware gap filling, contig joining and haplotype refinement, 3) creating a new DSA-based genome aligner for rapidly aligning a finished sequence to an annotated reference genome which together with 4) a new feature transfer and analysis module, will permit initial annotation of the finished genome along with a cataloging of variants and their impact in both native and reference coordinates. Inclusion of the reference coordinates allows variants in the new genome to be easily associated with the wealth of information available through the numerous online knowledgebase resources.


Issue Date FY	Funding FY	Legal Entity Name	Legal Entity Address	Legal Entity City	Legal Entity State	Legal Entity Zip Code	Legal Entity COUNTY	Legal Entity COUNTRY	Assistance Listing	Award Code	Budget Year	Action Date	Action Type	Action Amount

Issue Date FY: 2022 ( Subtotal = $0 )
2022	2020	DNASTAR, INC.	3801 REGENT ST	MADISON	WI	53705	DANE	USA	Biomedical Research and Research Training	000	3	7/27/2022	NON-COMPETING CONTINUATION	$0
														Subtotal = $0

Issue Date FY: 2020 ( Subtotal = $750,001 )
2020	2020	DNASTAR, INC.	3801 REGENT ST STE G	MADISON	WI	53705	DANE	USA	Biomedical Research and Research Training	001	3	2/13/2020	NON-COMPETING CONTINUATION	$750,001
2020	2018	DNASTAR, INC.	3801 REGENT ST STE G	MADISON	WI	53705	DANE	USA	Biomedical Research and Research Training	000	1	1/12/2020	NEW	$0
														Subtotal = $750,001

Issue Date FY: 2019 ( Subtotal = $750,001 )
2019	2019	DNASTAR, INC.	3801 REGENT ST STE G	MADISON	WI	53705	DANE	USA	Biomedical Research and Research Training	000	2	3/1/2019	EXTENSION WITH OR WITHOUT FUNDS	$750,001
														Subtotal = $750,001

Issue Date FY: 2018 ( Subtotal = $149,981 )
2018	2018	DNASTAR, INC.	3801 REGENT ST STE G	MADISON	WI	53705	DANE	USA	Biomedical Research and Research Training	000	1	8/20/2018	NEW	$149,981
														Subtotal = $149,981

Grand Total All Awards = $1,649,983

Top

All Categories

About

Search

Reports

Data Submission

Award Information

Scalable post-assembly editing software for finishing and annotating personal genomes

Award Number: R44GM128518

ORGANIZATION: NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES

OPDIV: NIH

AWARD CLASS: DISCRETIONARY

AWARD ACTIVITY TYPE: SCIENTIFIC/HEALTH RESEARCH (INCLUDES SURVEYS)

PERIOD OF PERFORMANCE START DATE: 09/01/2018

PERIOD OF PERFORMANCE END DATE: 02/28/2022

Federal Websites

Department of Health & Human Services

HHS Operating Divisions

HHS Staff Divisions

Download A Document Viewer