Novel deep learning strategy to better predict pharmacological properties of candidate drugs and focus discovery efforts - PROJECT SUMMARY
Collaborative Drug Discovery, Inc. (CDD) proposes to develop a novel approach based on deep learning neural
networks to encode molecules into chemically rich vectors. We will first apply this representation to build
more powerful computational models that can more accurately predict properties such as bioactivity, ADME/
Tox, and pharmacokinetics across libraries of molecular structures. The ultimate goal is to leverage this repre-
sentation to generate novel compounds with better combinations of properties. Both of these capabilities will
help scientists to accelerate discovery of new drugs broadly across many therapeutic areas.
Scientists engaged in drug discovery research from academic laboratories to large pharmaceutical
companies rely on computational QSAR models to predict pharmacologically relevant properties and obviate
the need to perform expensive, time-consuming assays (many of which require animal studies) for every
molecule of interest. Some properties (e.g. logP) can now be modeled with such high confidence that the
models have replaced the need to perform the assays, but many other critical properties (e.g. solubility, ADME,
PK, hERG) remain far from this goal. We expect that our proposed chemically rich vectors will significantly
advance the state of the art beyond what can be achieved with conventional descriptors and fingerprints.
Improved models will enable researchers to select lead candidate series more effectively, explore chemical
space around leads to generate novel IP more efficiently, reduce failure rates for compounds advancing
through the drug discovery pipeline, and accelerate the entire drug discovery process. These benefits will be
realized broadly across most therapeutic areas.
Our central innovation is a novel computational strategy: first develop a deep learning (DL) model
optimized to best capture the essential structural and chemical features of molecules, starting from the most
natural structural representation; then validate the DL model by applying it to improve QSAR modeling of
pharmacological properties; and finally extend it to generate previously unknown molecules that have superior
properties – the so-called “inverse QSAR” problem, which is the Holy Grail of computational medicinal
chemistry. Others have unsuccessfully tried to leap directly to solve the inverse QSAR problem. We propose a
more patient and methodical approach that will allow the neural network to perform self-supervised training to
learn about chemical structures and properties from readily available, extremely large datasets, then transfer
this learning to improve modeling; only after establishing this solid foundation do we intend to apply the
models to attempt inverse QSAR. Prior attempts in this area have also relied on neural network architectures
designed originally for language processing. We will design a new architecture, more akin to neural network
architectures that have proven most successful at image classification, and optimize it to directly process the
“molecular graph” that represents the relationship of atoms and bonds in molecules.