Tuesday, February 17, 2026 2/17/2026

Software for the complete characterization of antibody repertoires: from germline and mRNA sequence assembly to deep learning predictions of their protein structures and targets

Award Number: R44GM150362
ORGANIZATION: NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES
OPDIV: NIH
AWARD CLASS: DISCRETIONARY
AWARD ACTIVITY TYPE: SCIENTIFIC/HEALTH RESEARCH (INCLUDES SURVEYS)
PERIOD OF PERFORMANCE START DATE: 08/29/2023
PERIOD OF PERFORMANCE END DATE: 07/31/2026

Group Awards By:

View Award Description

Software for the complete characterization of antibody repertoires: from germline and mRNA sequence assembly to deep learning predictions of their protein structures and targets - The B cell population in each individual produces an estimated 1010 different antibodies, collectively known as the antibody repertoire. This extraordinary diversity is essential for responding to the unique history of infections, vaccinations and cancer encountered over an individual’s lifetime. Conversely, regulatory errors in the system play a pivotal role in a host of auto-immune diseases. Antibodies are composed of two proteins, a heavy and light chain, each containing a variable region, VH and VL, which together confer antigen binding specificity. Diversity is initiated through differential recombination at the three V region encoding loci to produce the naïve repertoire. Upon antigen exposure, B cells expressing an antibody specific to that antigen undergo clonal expansion and concentrated somatic hypermutation (SHM) of V region sequences that code for the antigen recognition domain. Those clonally derived B cells (clonotypes) each express a different sequence and thereby structural variant of the initial unmutated antibody. Cells expressing higher affinity variants are selected for in a process known as affinity maturation. In this way, the mature repertoire is built from the history of antigenic encounters by that individual. Efficient deciphering of that history could contribute to improving human health in numerous ways from better clinical decision making to improved diagnostics and therapeutics. Toward that goal, ongoing technological advances in both DNA/RNA sequencing, protein structure modeling software and high-performance scalable computer hardware are making virtual repertoire scale antibody structure and antigen screening attainable in the not-too-distant future. In this Direct to Phase II application, we propose to build a software suite that bridges the gap between genomics and structural biology enabling antibody repertoires to be deciphered and mined in exquisite detail. To do so, we first leverage our highly extensible sequence assembler, XNG, to produce haplotype phased and annotated sequences of the germline IG loci from which the naïve repertoire can be simulated (Aim 1). Next, XNG is used to assemble and annotate bulk BCR-seq data producing the linear VH and VL encoding sequences of the mature repertoire (Aim 2). Translated repertoire sequences are then used as input for our protein modeling software, NovaFold-Ab and NovaFold-AI, where high accuracy 3D antibody structures are predicted (Aim 3). Those antibody structure libraries are then used in virtual screens to identify members that bind to a target antigen with our protein interaction modeling program, NovaDock (Aim 4). Screens can also be refined to specific epitopes of interest, for example, those known to elicit neutralizing antibodies. If realized, these capabilities will have significant commercial opportunities for complementing existing technology in improving clinical care and personalized medicine as well as aiding in the development of faster, more cost effective diagnostics and therapeutics.


Issue Date FY	Funding FY	Legal Entity Name	Legal Entity Address	Legal Entity City	Legal Entity State	Legal Entity Zip Code	Legal Entity COUNTY	Legal Entity COUNTRY	Assistance Listing	Award Code	Budget Year	Action Date	Action Type	Action Amount

Issue Date FY: 2025 ( Subtotal = $691,914 )
2025	2025	DNASTAR, INC.	1202 ANN ST	MADISON	WI	53713	DANE	USA	Biomedical Research and Research Training	000	3	9/4/2025	NON-COMPETING CONTINUATION	$691,914
														Subtotal = $691,914

Issue Date FY: 2024 ( Subtotal = $681,914 )
2024	2024	DNASTAR, INC.	3801 REGENT ST	MADISON	WI	53705	DANE	USA	Biomedical Research and Research Training	000	2	9/5/2024	NON-COMPETING CONTINUATION	$681,914
														Subtotal = $681,914

Issue Date FY: 2023 ( Subtotal = $648,790 )
2023	2023	DNASTAR, INC.	3801 REGENT ST	MADISON	WI	53705	DANE	USA	Biomedical Research and Research Training	000	1	8/29/2023	NEW	$648,790
														Subtotal = $648,790

Grand Total All Awards = $2,022,618

Top

All Categories

About

Search

Reports

Data Submission

Award Information

Software for the complete characterization of antibody repertoires: from germline and mRNA sequence assembly to deep learning predictions of their protein structures and targets

Award Number: R44GM150362

ORGANIZATION: NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES

OPDIV: NIH

AWARD CLASS: DISCRETIONARY

AWARD ACTIVITY TYPE: SCIENTIFIC/HEALTH RESEARCH (INCLUDES SURVEYS)

PERIOD OF PERFORMANCE START DATE: 08/29/2023

PERIOD OF PERFORMANCE END DATE: 07/31/2026

Federal Websites

Department of Health & Human Services

HHS Operating Divisions

HHS Staff Divisions

Download A Document Viewer