Teaching free energy calculations to learn - ABSTRACT Current approaches to small molecule drug discovery are slow, expensive, and prone to failure. The abundance of data for structural biology has led to the emergence of a variety of computer-aided drug discovery methodologies aiming to leverage this structural data to predict compound affinities to prioritize compounds for synthesis in the pursuit of potency. Alchemical free energy methods, which use rigorous statistical mechanics to predict the free energy of binding, have led the way in providing useful predictions for improving or maintaining potency in hit-to-lead and lead optimization phases of structure-enabled programs. Numerous offerings---such as Schrödinger FEP+, Orion NES, CCG AMBER-TI, and the Open Force Field Consortium have emerged that provide engineered solutions with widespread industry adoption for structure-enabled discovery, integrating advances from academia that my lab has been fortunate to contribute to over the last 15 years. In contrast to physical methods such as alchemical free energy calculations, the emergence of machine learning models based on deep learning architectures has provided a complementary tool for computer-aided drug discovery. While physical models can generalize broadly across properties of interest for many target proteins, they lack the ability to easily learn from data generated for a discovery program. Machine learning methods, on the other hand, can readily learn from data, but often lack the ability to predict target-specific properties due to their need for large training sets, generally limiting their utility to properties of relevance to drug discovery that do not depend on the target, like ADMET properties. This proposal builds on our highly successful work in alchemical free energy calculations by proposing a new generation of hybrid physical / machine learning models that overcome the limitations of each method on its own: By endowing alchemical free energy calculations with the ability to learn at multiple scales, we aim to bring these tools into the next decade. Building on our extensive history of innovation in alchemical free energy calculations for drug discovery, we will (1) significantly increase their accuracy via the integration of ML potentials; (2) expand their domain of applicability beyond affinity to encompass conformational and target selectivity, resistance, interactions with structurally-enabled toxicity targets, and physical properties like membrane permeability, lipophilicity, and solubility; (3) eliminate accuracy-limiting challenges associated with current calculations such as sampling of protonation and tautomeric states, structured waters, and ions; (4) introduce learnability into every aspect of alchemical free energy calculations to enable predictions to become systematically more accurate as project data is collected and greatly reduce the cost of evaluating very large virtual synthetic spaces with alchemical-like accuracy; and (5) cast alchemical predictions in a Bayesian framework to enable the propagation of uncertainties in the underlying models into predicted affinities, selectivities, and other properties.