Integrating Machine Learning and Atomistic Simulations for Accurate Prediction of Drug Molecular Crystal Solubility - Project Summary / Abstract Solubility is a crucial factor in pharmaceutical drug discovery and development, directly impacting bioavailability, efficacy, and formulation. However, experimental solubility assays are often complex and resource-intensive, while informatics-based approaches are hindered by the lack of comprehensive, high-quality datasets. Atomistic simulations, though promising, are limited by inadequate interatomic potentials, challenges in free energy estimation, and the polymorphism of molecular crystals. Our lab focuses on designing and applying atomistic simulations, statistical mechanics, and machine learning methods to predict the properties of materials and molecules. Our long-term vision is to create a computational platform that transforms drug discovery by enabling reliable predictions of key molecular properties. Over the next five years, we will develop a robust and automatable computational framework to predict the solubility of drug-related molecular crystals. This effort will involve creating efficient machine learning potentials (MLPs) based on quantum mechanical calculations, designed for broad applicability to organic systems of biological relevance. We will leverage these MLPs, along with generative models, to enhance crystal structure predictions. We will also build an automated workflow for high-throughput solubility computations across a diverse set of molecular systems under consistent conditions. Additionally, we will develop data-driven models that improve the accuracy of informatics-based solubility predictions. The computational framework will be validated against publicly available solubility datasets and further tested using blinded solubility data for newly synthesized compounds. This will demonstrate the accuracy and utility of the framework in real-world drug discovery contexts.