PROJECT SUMMARY/ABSTRACT
We cannot yet look at a chemical structure and predict if the molecule will have an odor, much less what
character it will have. The goal of the proposed research is to apply machine learning to predict perceptual
characteristics from chemical features of molecules. The specific aims of the proposal will determine (1) which
molecules are odorous , and (2) what data are needed to model odor character. Building a highly predictive
model requires two key ingredients: high-quality data and a sound modeling approach. High-quality data must
be accurate (ratings are consistent and describe true odor properties) and detailed (ratings describe even
small differences in odor properties). We have collected human psychophysical data on a diverse set of
molecules and have trained a model to predict if a molecule has an odor, but pilot data identified odorous
contaminants that limit model training and measurement of model accuracy. In Aim 1, I will apply my
background in analytical chemistry to evaluate the accuracy of the data, using gas chromatography to identify
and correct errors caused by chemical contaminants. In Aim 2, I will apply my experience in human sensory
evaluation to measure and compare the consistency and the degree of detail in ratings that can be achieved
with different sensory methods and subject training procedures. By executing my training plan, I will develop
the skills in statistical programming and machine learning needed to employ a sound modeling approach to
these problems. The model constructed in Aim 1 will enable prediction of odor classification (odor/odorless) for
any molecule and thus define which molecules are perceptually relevant. Predicting odor character is a far
more complex challenge – while a molecule can have only one of two odor classifications (odor or odorless) it
may elicit any number of diverse odor character attributes (fruity, floral, musky, sweet, etc.). Descriptive
Analysis (DA) is the gold standard method for generating accurate and detailed sensory profiles, but this
method is time-consuming. We estimate that an odor character dataset will be large enough (“model-ready”) to
predict odor character with approximately 10,000 molecules and that it would require more than 30,000 hours
of human subject evaluation, or approximately 6 years for the typical trained panel, to produce this dataset
using DA. Before we invest the time and resources, it is responsible to evaluate the relative data quality of
more rapid sensory methods. The results of Aim 2 are expected to determine the best approach for generating
a model-ready dataset by quantifying trade-offs in degree of detail (data resolution), rating consistency, and
method speed of five candidate sensory methods. Together, these aims represent a significant step forward in
linking chemical recipe to human odor perception, an advancement that supports the NIDCD goal of
understanding normal olfactory function (how stimulus relates to percept) and has many potential applications
in foods (what composition of molecules should be present to produce a target aroma percept).