ABSTRACT
Current approaches to small molecule drug discovery are slow, expensive, and prone to failure. The abundance
of data for structural biology has led to the emergence of a variety of computer-aided drug discovery
methodologies aiming to leverage this structural data to predict compound affinities to prioritize compounds
for synthesis in the pursuit of potency. Alchemical free energy methods, which use rigorous statistical
mechanics to predict the free energy of binding, have led the way in providing useful predictions for improving
or maintaining potency in hit-to-lead and lead optimization phases of structure-enabled programs. Numerous
offerings---such as Schrödinger FEP+, Orion NES, CCG AMBER-TI, and the Open Force Field Consortium have
emerged that provide engineered solutions with widespread industry adoption for structure-enabled discovery,
integrating advances from academia that my lab has been fortunate to contribute to over the last 15 years.
In contrast to physical methods such as alchemical free energy calculations, the emergence of machine learning
models based on deep learning architectures has provided a complementary tool for computer-aided drug
discovery. While physical models can generalize broadly across properties of interest for many target proteins,
they lack the ability to easily learn from data generated for a discovery program. Machine learning methods, on
the other hand, can readily learn from data, but often lack the ability to predict target-specific properties due to
their need for large training sets, generally limiting their utility to properties of relevance to drug discovery that
do not depend on the target, like ADMET properties.
This proposal builds on our highly successful work in alchemical free energy calculations by proposing a new
generation of hybrid physical / machine learning models that overcome the limitations of each method on its
own: By endowing alchemical free energy calculations with the ability to learn at multiple scales, we aim to
bring these tools into the next decade. Building on our extensive history of innovation in alchemical free energy
calculations for drug discovery, we will (1) significantly increase their accuracy via the integration of ML
potentials; (2) expand their domain of applicability beyond affinity to encompass conformational and target
selectivity, resistance, interactions with structurally-enabled toxicity targets, and physical properties like
membrane permeability, lipophilicity, and solubility; (3) eliminate accuracy-limiting challenges associated with
current calculations such as sampling of protonation and tautomeric states, structured waters, and ions; (4)
introduce learnability into every aspect of alchemical free energy calculations to enable predictions to become
systematically more accurate as project data is collected and greatly reduce the cost of evaluating very large
virtual synthetic spaces with alchemical-like accuracy; and (5) cast alchemical predictions in a Bayesian
framework to enable the propagation of uncertainties in the underlying models into predicted affinities,
selectivities, and other properties.