Sunday, July 6, 2025 7/6/2025

DOCKET: accelerating knowledge extraction from biomedical data sets

Award Number: OT2TR003443
ORGANIZATION: NATIONAL CENTER FOR ADVANCING TRANSLATIONAL SCIENCES
OPDIV: NIH
AWARD CLASS: DISCRETIONARY
AWARD ACTIVITY TYPE: OTHER
PERIOD OF PERFORMANCE START DATE: 01/24/2020
PERIOD OF PERFORMANCE END DATE: 11/30/2024

Group Awards By:

View Award Description

DOCKET: accelerating knowledge extraction from biomedical data sets - Component type: This Knowledge Provider project will continue and significantly extend work done by the Translator Consortium Blue Team, focusing on deriving knowledge from real-world data through complex analytic workflows, integrated to the Translator Knowledge Graph, and served via tools like Big GIM and the Translator Standard API. The problem: We aim to solve the “first mile” problem of translational research: how to integrate the multitude of dynamic small-to-large data sets that have been produced by the research and clinical communities, but that are in different locations, processed in different ways, and in a variety of formats that may not be mutually interoperable. Integrating these data sets requires significant manual work downloading, reformatting, parsing, indexing and analyzing each data set in turn. The technical and ethical challenges of accessing diverse collections of big data, efficiently selecting information relevant to different users’ interests, and extracting the underlying knowledge are problems that remain unsolved. Here, we propose to leverage lessons distilled from our previous and ongoing big data analysis projects to develop a highly automated tool for removing these bottlenecks, enabling researchers to analyze and integrate many valuable data sets with ease and efficiency, and making the data FAIR [1]. Plan: (AIM 1) We will analyze and extract knowledge from rich real-world biomedical data sets (listed in the Resources page) in the domains of wellness, cancer, and large-scale clinical records. (AIM 2) We will formalize methods from Aim 1 to develop DOCKET, a novel tool for onboarding and integrating data from multiple domains. (AIM 3) We will work with other teams to adapt DOCKET to additional knowledge domains. ■ The DOCKET tool will offer 3 modules: (1) DOCKET Overview: Analysis of, and knowledge extraction from, an individual data set. (2) DOCKET Compare: Comparing versions of the same data set to compute confidence values, and comparing different data sets to find commonalities. (3) DOCKET Integrate: Deriving knowledge through integrating different data sets. ■ Researchers will be able to parameterize these functions, resolve inconsistencies, and derive knowledge through the command line, Jupyter notebooks, or other interfaces as specified by Translator Standards. ■ The outcome will be a collection of nodes and edges, richly annotated with context, provenance and confidence levels, ready for incorporation into the Translator Knowledge Graph (TKG). ■ All analyses and derived knowledge will be stored in standardized formats, enabling querying through the Reasoner Std API and ingestion into downstream AI assisted machine learning. ■ Example questions this will allow us to address include: (Wellness) Which clinical analytes, metabolites, proteins, microbiome taxa, etc. are significantly correlated, and which changing analytes predict transition to which disease? [2,3] (Cancer) Which gene mutations in any of X pathways are associated with sensitivity or resistance to any of Y drugs, in cell lines from Z tumor types? (All data sets) Which data set entities are similar to this one? Are there significant clusters? What distinguishes between the clusters? What significant correlations of attributes can be observed? How can this set of entities be expanded by adding similar ones? How do these N versions of this data set differ, and how stable is each knowledge edge as the data set changes over time? Collaboration strengths: Our team has extensive experience with biomedical and domainagnostic data analytics, integrating multiple relevant data types: omics, clinical measurements and electronic health records (EHRs). We have participated in large collaborative consortia and have subject matter experts willing to advise on proper data interpretation. Our application synergizes with those of other Translator teams (see Letters of Collaboration). Challenges: Data can come in a bewildering diversity of for


Issue Date FY	Funding FY	Legal Entity Name	Legal Entity Address	Legal Entity City	Legal Entity State	Legal Entity Zip Code	Legal Entity COUNTY	Legal Entity COUNTRY	Assistance Listing	Award Code	Budget Year	Action Date	Action Type	Action Amount

Issue Date FY: 2025 ( Subtotal = $0 )
2025	2024	INSTITUTE FOR SYSTEMS BIOLOGY	401 TERRY AVE N	SEATTLE	WA	98109	KING	USA	National Center for Advancing Translational Sciences	000	1	4/1/2025	SUPPLEMENT FOR EXPANSION	$0
														Subtotal = $0

Issue Date FY: 2024 ( Subtotal = $609,144 )
2024	2024	INSTITUTE FOR SYSTEMS BIOLOGY	401 TERRY AVE N	SEATTLE	WA	98109	KING	USA	National Center for Advancing Translational Sciences	002	1	2/16/2024	SUPPLEMENT FOR EXPANSION	$0
2024	2024	INSTITUTE FOR SYSTEMS BIOLOGY	401 TERRY AVE N	SEATTLE	WA	98109	KING	USA	National Center for Advancing Translational Sciences	001	1	12/5/2023	SUPPLEMENT FOR EXPANSION	$609,144
2024	2024	INSTITUTE FOR SYSTEMS BIOLOGY	401 TERRY AVE N	SEATTLE	WA	98109	KING	USA	National Center for Advancing Translational Sciences	003	1	6/18/2024	SUPPLEMENT FOR EXPANSION	$0
2024	2020	INSTITUTE FOR SYSTEMS BIOLOGY	401 TERRY AVE N	SEATTLE	WA	98109	KING	USA	National Center for Advancing Translational Sciences	000	1	11/30/2023	NEW	$0
														Subtotal = $609,144

Issue Date FY: 2023 ( Subtotal = $609,144 )
2023	2023	INSTITUTE FOR SYSTEMS BIOLOGY	401 TERRY AVE N	SEATTLE	WA	98109	KING	USA	National Center for Advancing Translational Sciences	001	1	12/10/2022	SUPPLEMENT FOR EXPANSION	$609,144
2023	2020	INSTITUTE FOR SYSTEMS BIOLOGY	401 TERRY AVE N	SEATTLE	WA	98109	KING	USA	National Center for Advancing Translational Sciences	000	1	12/1/2022	NEW	$0
														Subtotal = $609,144

Issue Date FY: 2022 ( Subtotal = $609,144 )
2022	2022	INSTITUTE FOR SYSTEMS BIOLOGY	401 TERRY AVE N	SEATTLE	WA	98109	KING	USA	National Center for Advancing Translational Sciences	001	1	1/29/2022	SUPPLEMENT FOR EXPANSION	$609,144
2022	2020	INSTITUTE FOR SYSTEMS BIOLOGY	401 TERRY AVE N	SEATTLE	WA	98109	KING	USA	National Center for Advancing Translational Sciences	000	1	1/27/2022	NEW	$0
														Subtotal = $609,144

Issue Date FY: 2021 ( Subtotal = $676,757 )
2021	2021	INSTITUTE FOR SYSTEMS BIOLOGY	401 TERRY AVE N	SEATTLE	WA	98109	KING	USA	National Center for Advancing Translational Sciences	003	1	8/18/2021	SUPPLEMENT FOR EXPANSION	$343,788
2021	2021	INSTITUTE FOR SYSTEMS BIOLOGY	401 TERRY AVE N	SEATTLE	WA	98109	KING	USA	National Center for Advancing Translational Sciences	001	1	2/9/2021	SUPPLEMENT FOR EXPANSION	$332,969
2021	2020	INSTITUTE FOR SYSTEMS BIOLOGY	401 TERRY AVE N	SEATTLE	WA	98109	KING	USA	National Center for Advancing Translational Sciences	002	1	8/12/2021	NEW	$0
2021	2020	INSTITUTE FOR SYSTEMS BIOLOGY	401 TERRY AVE N	SEATTLE	WA	98109	KING	USA	National Center for Advancing Translational Sciences	000	1	2/4/2021	NEW	$0
														Subtotal = $676,757

Grand Total All Awards = $2,504,189

Top

All Categories

About

Search

Reports

Data Submission

Award Information

DOCKET: accelerating knowledge extraction from biomedical data sets

Award Number: OT2TR003443

ORGANIZATION: NATIONAL CENTER FOR ADVANCING TRANSLATIONAL SCIENCES

OPDIV: NIH

AWARD CLASS: DISCRETIONARY

AWARD ACTIVITY TYPE: OTHER

PERIOD OF PERFORMANCE START DATE: 01/24/2020

PERIOD OF PERFORMANCE END DATE: 11/30/2024

Federal Websites

Department of Health & Human Services

HHS Operating Divisions

HHS Staff Divisions

Download A Document Viewer