Identifying molecules with similar shape and characteristics to known ligands has proven useful in several
areas of drug discovery and development, from ligand-based virtual screening (LBVS) to scaffold-hopping. The
underlying assumption is that compounds that occupy a similar volume with similar chemical groups will have
similar activity at the target protein, due to formation of similar protein-ligand interactions. However, in the
aqueous in vivo environment, protein-water-ligand interactions are equally important with water bridges, water
networks, and water displacement playing a critical role in the binding of many compounds. To our knowledge,
no 3D ligand matching methods explicitly account for these important waters as potential ligand space, which
leads to false negatives during shape matching, as molecules that appear dissimilar in vacuo may in fact
behave similarly in the binding pocket once water is accounted for. We aim to change that by creating the first
program that considers these water molecules when comparing ligands in 3D, factoring them in when scoring
similarity. To do so, we will adapt our previously developed algorithm (WATGEN) for the prediction of water
positions in the unbound (“empty”) protein and the protein ligand complex, as well as a calculation of ligand-
driven water displacement in protein−ligand complexes. To extend this work, waters relevant to shape
matching will be identified using a combination of machine learning (ML) and empirical algorithms, which are
based on the 9,000+ solvated structures in our previous study, each with corresponding displacement
calculations. This step will calculate the “replaceability” and “displaceability” of WATGEN predicted waters,
indicating how they should be represented for shape matching physically and chemically. We will then write
code to automatically create “hybrid” ligands through addition (or removal) of atoms based on solvation
representation determined above. Finally, we will validate our new solvation 3D shape matching methodology
by comparing the new methodology to current waterless methodology in two settings that rely on ligand-based
similarity scoring: 1) Evolution of a first-generation sulfonylurea (tolbutamide) to more advanced drugs like
glyburide using our AI-driven Drug Design platform, and 2) LBVS using unmodified and solvation-informed
tolbutamide as reference structures for screening the WuXi GalaXi “off-the-shelf” virtual library. The most
similar compounds to each reference will be purchased and assayed for glucose-stimulated insulin secretion
activity. We predict that for each of these experiments, the new methodology will outperform our current
waterless methodology. These features will be integrated into our existing shape matching algorithm within the
ADMET Predictor platform, which is freely available to academic researchers, leading to improvements in drug
discovery and optimization.