Project Summary/Abstract
This project proposes the development of new methods and data resources to integrate modern artificial intelligence (AI)
techniques into predictive toxicology, as well as the application of those methods and resources to generate new hypotheses linking putative toxicants to specific clinical outcomes. The recent explosion of publicly available chemical and biomedical data provides an immensely valuable resource for computational toxicologists, but existing techniques for learning
from these data perform poorly and fail to capture crucial patterns that span multiple levels of biological organization. For
example, the US FDA maintains a computational toxicology database cataloguing over 875 thousand chemicals of toxicologic concern, yet only a small handful of these have been characterized in terms of their downstream clinical effects.
However, informatics and machine learning (ML) provide specific tools that may solve this issue. This project focuses on
2 of those in particular: Graph machine learning (Graph ML) and semantic data analysis. Since both of these techniques
allow for the integration of information from multiple otherwise incongruent sources, they have the capacity to outperform
simpler traditional methods for pattern discovery, while increasing both inferential capacity and statistical power.
Our central hypothesis is that inductive learning on semantic graph data provides an effective means for generating
and validating translational and mechanistic conclusions from existing public toxicology data. In Aim 1 (K99), a new
data infrastructure—driven by a large, ontology-controlled graph database aggregating public toxicology data—will
be constructed and evaluated on several important tasks in computational toxicology. Together, these resources will
be named `ComptoxAI'. Aim 2 (K99) will develop and apply a graph machine learning strategy to predict new adverse
outcome pathways (AOPs) in the graph database. Importantly, this aim will use an automated machine learning (Auto
ML) approach to discover optimized neural network architectures for this prediction task in a data-driven manner. This
Auto ML strategy will use estimation of distribution algorithms (EDAs) to search for optimized network architectures
in a probabilistic manner. An expected side effect of the Auto ML approach is increased model interpretability over
existing applications of Graph ML. Aim 3 (R00) will use semantic data analysis via ontological inference to refine Aim 2's
model outputs into meaningful knowledge, proposing specific mechanistic explanations for the newly proposed AOPs.
Aim 4 (R00) will use the resources and outcomes of the previous Aims as a starting point to develop and disseminate
new open-source data standards, software resources, and research reporting protocols, with the goal of creating a
collaborative, cross-institutional research ecosystem for AI research in computational toxicology.
Beyond the methodological and infrastructural contributions of this work, successful completion of the Specific Aims
will yield a library of mechanistically-based hypotheses linking putative toxicants to specific clinical outcomes, addressing
a major need in predictive toxicology. In supporting the goals of the open science movement, all research outcomes
from this project—including papers, software, data, and other resources—will be made available for free public reuse.