Abstract
The growing use of e-cigarettes and vaping devices in recent years is a concern for the
health community. While the safety has not yet been fully characterized, these devices are linked
to smoking cessation efforts, targeted marketing campaigns towards adolescents, and additives,
such as fruit flavors, that promote use. Experimental data has been collected to investigate
toxicity, lethality, and risk for cancer. However, the gaps in this type of data and the difficulty
collecting large datasets leads to challenges with risk assessment calculations. Computational
modeling to predict chemical and toxin distribution, deposition, and dosimetry has been
successfully demonstrated; however, the computational requirements are prohibitive for large
population studies. We hypothesize that replacing expensive computational models with a
machine learning model will produce accurate risk assessment for a low computational cost and
that this process can be generalized for other environmental health data.
This project is a close collaboration between Kitware, Inc. and Applied Research
Associates, Inc. (ARA). The Kitware team has extensive experience developing computational
physiology models for use in simulation, storage, curation, and analysis of large dataset for
medical and health related analysis, and machine learning techniques. We have developed an
open source platform, Girder, for creating customized workflows related to large datasets and
machine learning analysis. ARA has extensive experience in computational modeling and toxicity
analysis for the deposition and dosimetry of toxins and chemicals and the mechanisms associated
with e-cigarettes and vaping devices. In this project, we propose combining the expertise of the
teams at Kitware and ARA to develop customized workflow for large data set storage and
incorporating and analyzing machine learning techniques and results, respectively. We will
demonstrate this effectiveness of the workflow using synthetic data generated using a
computational framework of models. The specific aims of the Phase I project are: (1) Generate
large datasets using high-fidelity computational modeling approaches; (2) Create an optimized
workflow for ingesting large environmental health datasets for use in machine learning to calculate
risk assessment; and (3) Develop a machine learning model to replace first principles models and
predict risk assessment for environmental health.