Free energy-based active learning for ligand off-target and multitarget activity - Abstract Small molecule kinase inhibitors are a class of targeted therapeutic anti-cancer agents that selectively target specific kinases associated with a given cancer and are generally less toxic than general-purpose chemotherapies. Due to the highly conserved nature of the ATP-binding domain, kinase inhibitor off-target activity is often observed. This off-target activity can be beneficial--for instance, when the kinase inhibitor is active against multiple drug targets--or it can be deleterious, leading to toxic side effects. Thus, knowledge of off-target activity is essential during the early stages of the drug discovery pipeline; however, as experimental screening is costly and time consuming, off-target activity is often only explored in the final stages of lead optimization. Free energy-based in silico methods such as alchemical relative binding free energy (RBFE) simulations, which are becoming prevalent in traditional single target drug discovery efforts, afford a cost-effective solution; but, to date, no one has attempted their use for predicting off-target activity due to the cost and complexity of simulating multiple protein-ligand complexes for many ligands. To address this shortcoming, we propose to combine novel methods for reducing the computational expense of each simulation with multi-objective active learning to limit the overall number of requisite simulations to explore a chemical space. In Aim 1, we propose a novel method for the optimal stratification of the phase space for RBFE simulations to reduce computational expense while maintaining requisite accuracy. Combined with our recently developed on-the-fly optimization method, we will greatly reduce the computational cost of RBFE simulations. In Aims 2 and 3, we will combine these methods with our group’s recently developed free energy-based active learning workflow implemented for multi-objective optimization of ligand binding affinity within a chemical space, representing the first use of free energy-based methods for the prediction of affinity to multiple proteins in a drug discovery setting. In Aim 2, we will utilize this workflow to optimize ligands for selectivity to a kinase target and against activity of known off-target kinases. We will test this workflow by targeting the activin-like receptor kinase 2, mutations of which are associated with the fatal pediatric brainstem tumor diffuse intrinsic pontine glioma, while penalizing activity against other activin-like receptor kinases. In Aim 3, we will utilize this workflow to optimize multikinase inhibitors for activity against multiple known drug targets. We will test this workflow by optimizing the hit compound which led to the development of the multikinase inhibitor Entrectinib against anaplastic lymphoma kinase, c-ros oncogene 1 kinase, and tropomyosin receptor kinases A-C. By demonstrating the feasibility of free energy-based multi- objective optimization for affinity to and for selectivity over multiple kinases, we will greatly increase the efficiency of the early drug development pipeline for targeted therapeutics.