Abstract
We propose to develop a novel computational framework describing single cell gene expression to aid model
building and design for synthetic biology applications. Currently there are three major challenges to this goal.
First, there is a knowledge gap between experimental measurements and mathematical models: experiments
on synthetic circuits typically provide partial information, following the expression of a few proteins using
fluorescent tags while leaving many other molecular network components (such as promoters, protein-protein
complexes) uncharacterized. By contrast, existing theoretical models make ad-hoc assumptions regarding the
network and its interactions – more information than experiments can provide – and about noise statistics, thus
leading to over-parameterization. Next, over-parameterization is also a problem for circuit design that demands
models with minimal set of parameters to efficiently search through the parameter space. Third, experimental
data is in fluorescence and not in protein numbers, rendering traditional models inapplicable. The lack of
models that predict single cell level behavior hinders our basic understanding of these circuits and ability to
manipulate cellular heterogeneity to control microbial dynamics. We propose to bridge this gap.
An important breakthrough lies in realizing that stochastic time courses of protein expression in single cells
hold crucial information about network details that are not directly visible otherwise. We have built a novel
mathematical tool called MaxCal, capable of harnessing information hidden in the noisy protein expression
trajectories to infer underlying models of synthetic circuits. Moreover, MaxCal works directly with trajectory and
hence easily incorporates additional algorithms to convert fluorescence to protein number (FNC) and avoids
data reduction unlike other methods. MaxCal is a top-down approach that builds the minimal model, avoids the
over-fitting issues of traditional approaches, and yields an effective feedback parameter facilitating design.
We will use this novel integrated framework (MaxCal + FNC) to build models for high quality single cell
temporal data that are becoming available with novel microfluidics tools. This is different from large-scale
network and transcriptome wide measurement pooling data over many cells. We will analyze raw protein
expression (recorded in fluorescence) trajectory data in specific S. cerevisiae strains containing a synthetic
positive feedback network that behaves as a biological switch, controlled by an inducer. This network is
bimodal, with cells dynamically switching between high and low expression levels of the target protein, a
strategy called bet-hedging frequently used by microbes that evade treatment by antibiotics. This application
may have potential therapeutic relevance in dealing with microbial populations becoming resistant to antibiotics
and other stressors. We will also address several design questions to build new circuits for specific function.