Monday, November 17, 2025 11/17/2025

Histotools: scaling digital pathology curation tools for quality control, annotation, labeling, and dataset identification

Award Number: R01LM013864
ORGANIZATION: NATIONAL LIBRARY OF MEDICINE
OPDIV: NIH
AWARD CLASS: DISCRETIONARY
AWARD ACTIVITY TYPE: SCIENTIFIC/HEALTH RESEARCH (INCLUDES SURVEYS)
PERIOD OF PERFORMANCE START DATE: 09/21/2022
PERIOD OF PERFORMANCE END DATE: 07/31/2026

Group Awards By:

View Award Description

Histotools: scaling digital pathology curation tools for quality control, annotation, labeling, and dataset identification - ABSTRACT: With recent approval of whole slide scanners for primary diagnosis, wherein routine glass histopathology slides are digitized and presented to clinical pathologists for diagnosis on computer monitors, a wealth of new untapped data is being created in routine clinical practice and placed in growing data lakes. In digital format, these whole slide images (WSIs) can be subjected to digital pathomics, i.e., the process of extracting quantitative image features associated with morphology, attributes, and relationships of histologic objects in WSIs. These features can subsequently be employed for discovery in many domains such as histogenomics, which sees associating phenotypical presentations with biological pathways and gene ontologies. Additionally, low-cost non-tissue destructive image-based companion diagnostic assays (CDx) can be developed for predicting prognosis and treatment response of patients. Unfortunately, unprocessed large data lakes (e.g., TCGA) are not alone sufficient for pathomics, and often require an intractable amount of human curation effort in (i) performing meticulous quality control of WSI (i.e., avoid “garbage-in, garbage-out”) and subsequently (ii) precisely annotating (e.g., cell boundary) and labeling (e.g., cell type) histologic objects. To address these major limiting factors in curating data lakes, we propose developing our small-scale HistoTools prototypes to employ computing clusters and thus enable their function at the scale of large digital slide repositories (DSR): (i) HistoQC for robust, reproducible quality control of WSI by identifying artifacts (blurriness) and outliers (poorly stained slides) for avoidance in downstream analyses, (ii) CohortFinder for identification and compensation of batch affects, (iii) Quick Annotator for rapid computer aided annotation generation via a combination of active and machine learning, (iv) PatchSorter for improving sub-typing of histologic objects with machine learning. We will evaluate HistoTools for improvement of quality control and the efficiency of both segmenting and labeling histologic objects of interest via (a) onsite curation and release of the 14k WSI used during our internal validation and (b) supported external curation of at least 100k WSI via 24-clinical affiliates from every continent, except Antarctica, whom together have access to over 20 million WSI during this proposal. Our validation use cases are designed to expedite existing onsite projects in the CDx space, consisting of 4 organs (breast, lung, heart, kidney), 3 diseases (cancer, kidney disease, and organ rejection) and WSIs collected from >70 sites. These cohort characteristics will help ensure the generalizability of our tools for curated data lake creation, with open-source and usability study approaches employed to obtain feedback from collaborators and the larger research community. Dissemination through consortia (ITCR, NEPTUNE) and websites (Github, TCIA) will improve visibility and adoption. The tools and well-curated data sets we release are anticipated to bootstrap researcher-initiated CDx discovery projects, along with the creation of their own onsite manicured data lakes. Together, this proposal will engender digital pathology based precision medicine research.


Issue Date FY	Funding FY	Legal Entity Name	Legal Entity Address	Legal Entity City	Legal Entity State	Legal Entity Zip Code	Legal Entity COUNTY	Legal Entity COUNTRY	Assistance Listing	Award Code	Budget Year	Action Date	Action Type	Action Amount

Issue Date FY: 2025 ( Subtotal = $332,739 )
2025	2025	EMORY UNIVERSITY	201 DOWMAN DR NE	ATLANTA	GA	30322	DEKALB	USA	Medical Library Assistance	000	4	7/30/2025	NON-COMPETING CONTINUATION	$332,739
														Subtotal = $332,739

Issue Date FY: 2024 ( Subtotal = $332,546 )
2024	2024	EMORY UNIVERSITY	201 DOWMAN DR NE	ATLANTA	GA	30322	DEKALB	USA	Medical Library Assistance	000	3	7/25/2024	NON-COMPETING CONTINUATION	$332,546
														Subtotal = $332,546

Issue Date FY: 2023 ( Subtotal = $353,142 )
2023	2023	EMORY UNIVERSITY	201 DOWMAN DR	ATLANTA	GA	30322	DEKALB	USA	Medical Library Assistance	001	2	8/9/2023	NON-COMPETING CONTINUATION	$353,142
2023	2022	EMORY UNIVERSITY	201 DOWMAN DR	ATLANTA	GA	30322	DEKALB	USA	Medical Library Assistance	000	1	2/22/2023	NEW	$0
														Subtotal = $353,142

Issue Date FY: 2022 ( Subtotal = $353,599 )
2022	2022	EMORY UNIVERSITY	201 DOWMAN DR	ATLANTA	GA	30322	DEKALB	USA	Medical Library Assistance	000	1	9/21/2022	NEW	$353,599
														Subtotal = $353,599

Grand Total All Awards = $1,372,026

Top

All Categories

About

Search

Reports

Data Submission

Award Information

Histotools: scaling digital pathology curation tools for quality control, annotation, labeling, and dataset identification

Award Number: R01LM013864

ORGANIZATION: NATIONAL LIBRARY OF MEDICINE

OPDIV: NIH

AWARD CLASS: DISCRETIONARY

AWARD ACTIVITY TYPE: SCIENTIFIC/HEALTH RESEARCH (INCLUDES SURVEYS)

PERIOD OF PERFORMANCE START DATE: 09/21/2022

PERIOD OF PERFORMANCE END DATE: 07/31/2026

Federal Websites

Department of Health & Human Services

HHS Operating Divisions

HHS Staff Divisions

Download A Document Viewer