In this supplement to the collaborative project initiated between the Digital Chemistry Group at the University of
Glasgow and The NCATS ASPIRE laboratory we will deepen the integration of the ¿DL chemical programming
language with the Open Reaction Database (ORD) as well as integrating Large Language Model (LLM)-based
AI approaches into the generation of ¿DL procedures directly from retrosynthetic analyses of target compounds.
This work will be accomplished during the term of the original grant. Two specific aims are proposed: 1. Develop
a ¿DL to ORD bridge which can be instantiated on a chemputer-based physical synthesis platform. (Coley Lab
collaboration); 2. Integrate large language models (LLMs) within the Chemical Description Language (¿DL)
framework to generate develop and interface ¿DLs for closed-loop active learning infrastructures (Chopra Lab
collaboration). These aims will be developed over the term of the funding in a highly integrated and collaborative
working modus operandi. For specific aim one we will develop a set of converters bridging the three stages of
the experimental life cycle: planning, execution, and reporting. This is achieved by integrating the planning and
reporting stages, which can be fully represented by the structured data schema of the ORD, with the central
stage of execution, which is fully expressible in ¿DL. These converters will include some level of inference,
through heuristics or otherwise, to fill in procedural details that might not be explicitly defined in the original plan.
They can also validate if a plan can be executed in a particular lab in terms of hardware compatibility. We will
realize such converters as open-source software tools and test these tools on a chemputer hardware platform
for a set of benchmark reactions. For specific aim two we will develop an extension to our Natural Language
Processing (NLP) approach to ¿DL procedure generation by using generated data sets to train a LLM AI system
to be able to produce ¿DL instruction files directly from retrosynthetic analysis of a desired molecular structure.
This will be accomplished by building a custom set of LLM agents designed to utilize the ¿DL NLP model to
interpret and write valid ¿DL code based on user input. By integrating these with the ¿DL blueprints which are
being developed for benchmark reactions as part of the NCATS ASPIRE collaboration, these ¿DL instructions
can then be generated from automated retrosynthetic analysis of a given molecule, or class of molecules even
if the suggested reactions do not yet exist in the chemical literature. We will produce specifications for a further
LLM based AI system to interpret the data generated by automatic analysis and to suggest new subsequent
experiments based on a pre-defined fitness function optimization (for example yield or purity of products) which
can be defined experimentally in the automated system.