This proposal describes the BioThings Explorer (BTE) platform as an Autonomous Relay
Agent (ARA). Most data integration efforts (including many being pursued within Translator) are
based on building centralized data resources. These efforts typically involve building resource-specific
data ingestion parsers, the outputs of which are assembled into knowledge graphs and
loaded into a local database (with neo4j being a popular option within Translator).
The BTE platform takes a complementary approach to data integration, focusing on a
distributed network of knowledge providers connected by application programming interfaces
(APIs). A proof-of-concept implementation of BTE is now complete (with demos available in
[1,2]). This implementation has three components (Figure 1): a distributed knowledge graph
(distributed-KG) based on many individual web APIs "in the wild", a meta-knowledge graph
(meta-KG) that describes compatibility of inputs and outputs between each API, and a BTE
python client that automates the planning and execution of queries across the API network.
Relative to centralized solutions, this distributed model offers several advantages. First, the API
ecosystem is easily extensible by the community since there is no gatekeeper controlling
access to a centralized resource. The scope and output of the BTE system will continuously and
automatically improve as the community-maintained registry of API components continues to
grow. Second, the data retrieved are always up-to-date with the source and not dependent on
having a frequent synchronization schedule. Third, the distributed model is more scalable to
heavy usage since the BTE client is run on each user's own computing infrastructure, bypassing
any centralized component that could become a single point of failure.
BTE also distinguishes itself from other federated API solutions by using semantically-precise
annotations of APIs as they exist, rather than mandating any changes to the API itself. These
API annotations describe API inputs and outputs in enough detail to support initiating the API
call and interpreting the result. In contrast, other federated approaches (including some within
Translator) require APIs to conform to a common API structure. However, requirements like
these have historically been significant barriers to widespread adoption, and thin API "wrappers"
around non-compliant APIs introduce additional complexity and technical points of failure.
The strengths of the current BTE implementation are the ease of adding new resources and the
automation of the query planning and execution. The critical challenge that we address in this
proposal is developing new methods to rank and sort the query path results that are retrieved
from BTE queries.