Project Summary/Abstract:
Recombinant proteins have a wide range of applications, from pharmaceutical products, drug
discovery, protein-based polymers for drug delivery, antibodies enzymes, and sustainable
technologies such as textiles and vegan food production such as the impossible burger. Towards
meeting these increased demands by replacing batch culture with continuous culture reduces
overhead costs, batch-to-batch variation, and increases protein production. Further, towards
maximizing recombinant protein, yield computational techniques for gene engineering, such as
codon optimization, use synonymous codon changes to increase protein production. Although
codon optimization increases protein production in specific systems, synonymous changes to a
gene sequence can cause unexpected detrimental results to the protein, such as protein
misfolding, decreased protein yield, and vector loss. Therefore, codon optimization may not
provide an optimal strategy for increasing protein production in batch culture and may introduce
risk when scaling to continuous culture. CFDRC has utilized state-of-the-art natural language
processing techniques to learn how synonymous codons are used by a target organism and apply
this learning to gene engineering. We demonstrated that our AI-based codon harmonization
model could predict the E. Coli synonymous codon usage with 73% accuracy, significantly above
prior reports. Using this AI-based approach to gene engineering will provide an optimal strategy
for increasing protein production in batch culture and continuous culture in addition to de-risk
scaling up to continuous culture.