Project Summary
Over the past 30 years, the synthesis of small molecules has become commonplace to facilitate drug discovery
and development efforts. Retrosynthesis reduces a structurally complex target molecule into increasingly
structurally simpler intermediates and commercially available starting materials, facilitating the preparation of a
target molecule (also referred to as product) through a series of logical synthetic reactions (i.e., a multi-step
synthetic route) from readily available starting materials or building blocks (referred to as reactants).
Retrosynthetic analysis has become the cornerstone of modern synthetic endeavors and has revolutionized
drug design. The goal of this project is to develop innovative generative AI – the type of AI that can create new
content, to generate synthetic reaction libraries with diverse and feasible reactant molecules to synthesize
given molecules via one-step synthetic reactions, and to evaluate and validate the libraries in laboratories
thoroughly and rigorously. To achieve the goal, we have the following Aims. Aim 1 is to generate diverse and
high-quality synthetic reaction libraries by developing innovative deep graph generative methods for
retrosynthesis prediction. Novel graph neural networks will be developed to best capture and represent
molecular structures for downstream retrosynthesis analysis. Innovative graph-based generative methods will
automate step-by-step modification of target molecules toward their reactants. Aim 2 is to generate diverse and
high-quality synthetic reaction libraries by developing innovative sequence-based methods for retrosynthesis
prediction. Novel sequence-based methods include pre-training strategies, SMILES editing and a
reinforcement learning framework will be developed. Aim 3 is to evaluate the generated synthetic reactions by
domain expertise and laboratory experiments. Successful completion of this project will enable diverse and
high-quality synthetic reaction libraries for any given drug-like molecule, which will be highly significant to
accelerating drug development (e.g., lead generation). More importantly, the project will enable new AI
capacity and infrastructure far beyond the conventional methodologies in synthetic route design. Successful
application of the new methodology is ultimately expected to facilitate rapid retrosynthetic analysis of newly
discovered or complex molecules as well as those with limited availability, enabling chemical synthesis for
subsequent biological evaluations.