The overall global market for toxicology testing was $8.1 billion in 2019 and is expected to reach $27 billion
by 2025. As toxicological testing is a pre-requisite step in most product development, it adds significant time and
costs, as well as represents human health hazards when key data is not captured. In order to minimize time to
market, expense, and animal use, advances in computational biology and machine learning (ML) are helping
conduct more efficient in-silico simulations. These strategies are driving strong growth for advanced
computational tools. More specifically, there is currently a strong value proposition in the $635 billon CPG market
for tools that ensure safety & expedite product design strategies by linking toxicology hazard profiles in
reproductive health to chemicals, exposure & product use cases. This will allow a better understanding and
prioritization of chemicals for integration in products to minimize associated reproductive health hazards.
The ToxIndex-CPG platform will solve this growing market need through a web-based interface that allows
CPG toxicology researchers access to customized data for early product planning & study design. The platform
will focus on continuous curation of a database to maintain known relationships in existing literature and data
sources, as well as advanced algorithms for predictive relationships for unknown combinations. This project will
target CPG products and reproductive health hazards, as this represents major markets & risks to vulnerable
populations. The user front end will be designed as a web-based tool for toxicology researchers to query specific
chemicals, CPG use cases, & health hazards. Based on query inputs, the platform will return a sorted and ranked
list of potential adverse reproductive health outcomes. Researchers will be able to explore impact of specific
chemicals on ranked reproductive hazards through advanced visualization tools. Hazard relationships between
chemicals and human factors & planned CPG product use cases will be learned through ML using quantitative
structure-activity relationship (QSAR) models. The platform will leverage existing data sources for chemical &
medical data to build models & continue to adaptively learn as datasets continue to grow. The platform will
prioritize application programming interfaces (API) to support a growing market of cheminformatics developers.
Phase I will target feasibility of data aggregation, ML development, & prototype interface. Development will
leverage an existing tool, Sysrev, for automated data extraction from publications & data sources to increase
likelihood of success. The Sysrev platform will parse existing data sources to extract known human factors and
use case susceptibility factors for a given chemical toxicant and reproductive health hazards. This will create an
initial hazard database of known factors as a gold standard for ML testing. Next, QSAR ML models will be
developed to associate chemicals to hazards, and then mediation models from chemicals through hazards to
understand causality likelihood in specific human factors and use cases of those chemicals. Finally, a prototype
web app and visualizations will be developed and deployed in a usability study with toxicology market users.