Sequence optimization for mRNA cancer therapy - SUMMARY mRNA technology is in the public spotlight thanks to its role in fighting the COVID-19 pandemic, and it is also rapidly transforming cancer therapeutics development through application areas such as immunotherapy. This wave of mRNA therapeutics is the result of decades of work on many fronts including improved delivery using lipid nanoparticles and inclusion of modified nucleosides for modulating immunogenicity. However, optimization of primary sequences still remains a difficult yet coveted challenge due to its untapped potential in controlling protein expression or encoding complex pharmacokinetics. Currently, mRNA therapy design typically involves adding UTRs from highly expressed native genes such as globin genes to codon-optimized (choosing most frequent synonymous codons) or GC content-enriched coding sequences. However, there is growing evidence that these strategies may be conceptually questionable and empirically suboptimal. At present, there exists no systematically validated model for mRNA therapeutics that can accurately predict and/or generate an optimal sequence to express a given target gene at desired levels. There is a significant need for such an in silico platform to accelerate therapy development timelines. Finally, mRNA sequences are largely chosen to have high expression but future therapies will impose additional design specifications for controllable expression. For example, it might be desirable to target protein expression to specific locations such as the site of a tumor. Improved design algorithms are necessary to satisfy constraints such as cell type or tissue specificity. Here, we propose to combine machine learning with massively parallel reporter assays (MPRAs) to build predictive models that relate mRNA sequence to stability and translation. We will then combine these models with innovative design algorithms to generate synthetic UTR and CDS sequences that result in (1) high protein expression across cell types or (2) highly cell type-specific protein expression. We will take an iterative approach to sequence design wherein we will synthesize designed UTRs, experimentally test them in a panel of cell types and then use the data to retrain the predictors until we meet the design objective. We will validate our approach by engineering improved mRNAs for cancer immunotherapy. In Specific Aim 1, we will develop MPRAs that interrogate 5’UTRs and coding sequences as well as 3’UTRs. In Specific Aim 2, we will develop machine learning approaches that enable us to integrate results from multiple measurements and generalize predictive rules learned from such assays. In Specific Aim 3, we will validate models by engineering regulatory elements that target protein expression to specific cell types of interest or that result in a specific level of protein expression.