Accessing and Expanding Natural Products Chemical Diversity by Big-data Analysis and Biosynthetic Investigation - PROJECT SUMMARY/ABSTRACT Natural products (NPs) have historically been a critical source of bioactive molecules, with NPs and their derivatives making up over 50% of FDA-approved small molecule drugs. In recent years, NP-based drug discovery is facing a fundamental barrier in identifying new drugs due to repeated rediscovery of the same or similar compounds, representing limited chemical diversity. Fortunately, since NPs have been evolving over billions of years in trillions of vastly diverse environments, there is an abundance of new bioactive NPs encoded in nature which may be useful as drugs. However, their accessibility is a problem: only less than 10% of NP biosynthetic gene clusters (BGCs) have been connected to existing NPs, leaving the vast majority of BGCs untapped as to what NPs they may produce. The overall goal of this research program is to leverage big-data informatic analysis and biosynthetic investigation to access and convert the tremendous genetic potential of these “orphan BGCs”, BGCs with unknown products, into chemical reality, connecting them to their products and in turn supplying structurally diverse pools of NPs for drug discovery screening. To this end, we propose two research directions: (1) Utilizing our established big-data correlational networking analysis, we have identified hidden proteases missing from the BGCs of almost all class III lanthipeptides. We previously used this method to discover two new families of class III lanthipeptides from Firmicutes for the first time. We will leverage these hidden proteases to further unlock the inherent chemical diversity of lanthipeptides and generate two libraries of natural and non-natural peptides through in vitro enzymatic synthesis and targeted biosynthetic engineering for drug discovery screening. (2) Mining the untapped microbial genetic potential, with an initial emphasis on sulfur- containing NPs and unprecedented biosynthetic pathway hybridization, we have prioritized two promising orphan BGCs with highly unique enzymology and connected them to their native products. The first features a novel S- hydroxylating flavoprotein, potentially involved in the formation of a new sulfur-containing functionality. The second has an unprecedented terpenoid-fatty acid-non-ribosomal peptide hybridization mediated by unusual cross-pathway enzymatic combinations. We will further investigate the new biosynthesis harbored by these BGCs to produce new NPs, inform future genome mining of similar pathways, and enable pathway engineering to further increase NPs chemical diversity. Our significant progress in both research directions supports the feasibility of this proposal as well as our competence to establish a successful and sustainable independent program in this field. We have fostered several key collaborations in bioactivity screening and protein structural biology that further strengthen our research program. In addition, this program will provide opportunities to train undergraduates, graduates, and postdoctoral fellows. Overall, this program is expected to discover new biosynthesis, expand NPs chemical diversity, and facilitate informatics-based NPs discovery and bioengineering to provide promising new drug leads.