Myc Transcription Factor Inhibitor Design: Integrating Atomic and Mesoscale with Semi-Supervised Generative Deep Learning Models - C ABSTRACT Myc Transcription Factor Inhibitor Design: Integrating Atomic and Mesoscale with Semi-Supervised Gen- erative Deep Learning Models Inhibition of master regulators such as Myc have considerable interest due to the reversal of the oncogenic state evoked by their removal. Adding to the mystique is the technical challenge in targeting a protein which possesses large regions of disorder. Though widely considered “undruggable”, the library of hits that disrupt Myc function continuously grows. The chemical features of a hit are difficult to deduce besides high molecular weight, aro- maticity, rigidity, and hydrophobicity. Understanding the more specific features of a protein-protein interaction (PPI) inhibitor is considerably difficult. In order to circumvent answering this question, machine learning methods have been applied to expand the library of experimentally determined hits in hopes of finding an improved inhibitor nearby in chemical space. Recently, the natural application of generative deep learning techniques to this prob- lem have been reported. This proposal explains a protocol for a semi-supervised expansion of small molecules which inhibit various reactions in the Myc transactivation pathway. The PPI inhibitors from three publicly available databases make up the training set (n=9516) while the known Myc inhibitors are the test set (n=100). In order to surpass the effectiveness of the test set, all known Myc inhibitors are removed from the training set. A number of latent variables which suffice to recreate the training set are solved. These variables represent the general struc- tural properties of PPI inhibitors, which may be associated with activities at various binding sites. The efficient calculation of activities is crucial to obtaining good performance. Therefore, a well-tempered ensemble of target configurations is pre-calculated at the all-atom resolution. Additionally, in order to incorporate the population level behavior of multiple Myc molecules into inhibitor design, mesoscale coarse-grain simulations in various sol- vents which drive liquid-liquid phase separation are performed. To identify interactions which correlate with phase response, various points in coarse-grain phase space are converted to all-atom resolution, further refined, and converted into contact maps. When evaluating a new lead, ensemble-based docking calculations are used, which calculate an average of averages of a ligand in different poses binding to different conformations randomly drawn from the ensembles. Reinforcement learning is applied to significantly reduce the time spent docking batches of leads while maintaining confidence in the result. Once new molecules are generated, these new leads are also optimized using absolute and relative free energy of binding methods. Ultimately, this study will test the limits of generative models to integrate data across multiple scales and develop inhibitors which evoke potent inhibition of intrinsically disordered proteins.