The Common Fund Knowledge Center (CFKC): providing scientifically valid knowledge from the Common Fund Data Ecosystem to a diverse biomedical research community. - Abstract
Making NIH Common Fund (CF) datasets FAIR is but the first step in realizing their potential
within the “big data” revolution. Science progresses through the accumulation of knowledge,
which achieves a wide reach only if it is accessible to a diverse spectrum of researchers. While
computer scientists have made substantial strides in modeling knowledge within “knowledge
graphs” (KGs), non-computational scientists can find it hard to interpret the graph-based
reasoning tools and visualizations that accompany KGs because such tools use logical
reasoning that does not account for scientific context or uncertainty and can produce a plethora
of scientifically invalid inferences.
Our CFDE KC will aim to present scientifically valid knowledge produced by CF projects. We will
represent this knowledge as a KG, compliant with existing CFDE and external knowledge
curation efforts. But we will focus on scientific validity through both (a) careful knowledge
extraction, by ensuring that each edge in the KG is either a primary experimental finding or the
result of an expert-applied analysis, and (b) careful knowledge presentation, by building a portal
that de-emphasizes general-purpose graph traversal in favor of single-purpose visualizations.
To implement this KC, we will draw from our experience managing four large-scale NIH-funded
projects that have faced similar challenges in related settings. First, our work on Terra provides
a foundation for securely storing biomedical data and making it available through cloud-based
workspaces. Second, our work on the Common Metabolic Diseases Knowledge Portal provides
a means to distill data into knowledge through expert-designed analyses that produce “summary
representations”, which are then presented through simple visualizations or multi-step
prescriptive workflows. Third, our work on the A2FKP provides experience tailoring knowledge
extraction and presentation to a variety of communities with different cultures and preferences.
Finally, our work on the Biomedical Translator provides experience developing and complying
with standards for knowledge representation and exchange.
In specific aim 1, we will coordinate working groups of CFDE and external investigators to
review the knowledge across CF projects and propose how to extract and represent it within the
KC. In specific aim 2, we will work with CF DCCs to define summary representations of their
data, provide them with software to make these summary representations available to us, and
regularly “pull” and integrate these summaries within a KG compliant with Translator standards.
In specific aim 3, we will use the software UI/UX and search infrastructure developed for the
CMDKP and A2FKP to build a knowledge portal that enables a diverse spectrum of scientists to
visualize and search CF data. In specific aim 4, we will combine our and the CF DCC’s prior
education and outreach strategies to publicize the portal and educate people in its use. Finally,
in specific aim 5, we will interface with other CFDE centers to build a combined Resource Portal
and form partnerships with external resources to amplify the reach of our KC.
Together, these aims will produce a CFDE KC that will unlock the full potential of CF resources
through an emphasis on scientific validity, enabling scientists of all levels of expertise to
understand, trust, and build upon them.