Component: We propose a generalization of our Translator reasoner tool mediKanren  into
“Doc Sherlock”: an Autonomous Relay Agent (ARA) to answer biomedical queries from multiple
Knowledge Providers (KPs) via probabilistic, logical and abductive inference.
Problem: Eliminating the “Unknown Known”: With the explosive growth of new publications,
data sets and discoveries, a distinctly modern problem has emerged: the rapid expansion of the
“unknown known.” The unknown known are facts that are either (1) forgotten -- facts published
but not known widely -- or (2) uninferred -- facts inferable from known facts but not yet deduced.
We take it as the role of KPs to uncover the forgotten by systematically harvesting existing
sources of knowledge. We thus take it as the role of ARAs to tackle the uninferred -- the
conclusions that could have been drawn if only all of the premises were co-resident in a single
reasoner’s mind. For pragmatic purposes, we restrict the queries to a tractable yet ambitious
class: queries raised by physician-scientists -- for whom the cost of the unknown known is
measured in patients' lives.
Plan : Doc Sherlock will use the advanced logic programming engine miniKanren [2,3] , and it will
use probabilistic inference rules to tackle queries inspired by physician-scientists and rank
results by confidence. For example, we imagine Doc Sherlock answering the question, “What
may treat 16p11.2 deletion syndrome?” by using a KP backed by something like Ensembl  to
look up all the genes in 16p11.2; a second KP backed by a dataset like gnomAD [5,6] to rank
haploinsufficient genes, e.g., “KCTD13 is haploinsufficient” [97% confidence]; a gene-gene KP
like SemMedDB  to find a relationship like “KCTD13 inhibits RhoA” [97% confidence] and
then using a drug-gene KP to find that “Simvastatin inhibits RhoA” [99% confidence] to
hypothesize that “Simvastatin may mitigate 16p11.2 deletion via RhoA inhibition” [93% imputed
confidence]. To answer queries from physician-scientists, data sources which have gene-gene,
drug-gene, disease-gene or drug-disease relationships will be high priority.
Collaboration : Building on our collaboration model from the current Translator phase, we plan
to visit other sites and have them visit us to work on problems from physician-scientists. We will
support the ARA standard API, and other Translator Standards. For queries richer than the
designated API, we will support programmatic access to Doc Sherlock’s query language.
Challenges: The primary technical challenge in reasoning across distinct KPs is intelligently
aliasing identical concepts within different data sets. To connect ontologies from different KPs,
we propose exploring the use of Galois connections [8,9] -- a generalization of isomorphism
suitable for use between two partial orders (such as ontologies). A major benefit will be the
ability to conduct abductive reasoning and to go beyond the restrictions of living within a single
KP -- our preliminary data shows a 43% improvement in inference across KPs [see Plan].