The transcriptome-wide impact of biological perturbations - Abstract
An important goal in computational biology is to leverage data from high-throughput functional assays to infer
the biological consequences of genetic variation. This goal is frequently approached by pairing RNA sequencing
and differential expression analysis. Most differential expression methods seek to identify a small number of
genes and gene-sets that are affected by a genetic perturbation. However, some genes, such as chromatin
regulators, may impact thousands of genes across the transcriptome. These dispersed effects are not captured
by existing methods. We will address this methodological gap in the differential expression field by
developing a novel statistical tool, and will apply this tool to both normative and disease contexts. In
Aim 1, we propose the Transcriptome-wide Impact Model (TIM), a parametric likelihood-based estimator of the
overall effect that a perturbation has on the transcriptome. TIM builds on existing differential expression methods,
but estimates parameters of the distribution of differential expression effects, rather than individual per-gene
effect sizes. This model is also extended to estimate gene-set enrichments and correlation between differential
expression signatures. In Aim 2, we aim to apply TIM to a recent Perturb-Seq dataset that perturbs all expressed
genes in vitro in a massively parallel manner, enabling us to identify which genes and gene-sets induce the
greatest transcriptomic change in human chronic myeloid leukemia cell lines when knocked down. We will also
use TIM to identify modules of genes that have similar impact on the transcriptome, and use these modules to
annotate genic function. In Aim 3, we will apply TIM to an in vivo Perturb-Seq dataset of 35 neurodevelopmental
disorder genes in developing mouse neocortex. Through this Aim, we will stratify neurodevelopmental disorder
genes by degree of transcriptome-wide impact, testing the hypothesis that neurodevelopmental-disorder-
associated gene expression regulators exert highly dispersed effects on the transcriptome in brain. If true, this
finding would raise the intriguing question of whether small, dispersed expression effects can be pathogenic,
opening novel avenues for research into neurodevelopmental disorders, as well as many other diseases that are
associated with expression regulators (e.g. cancer). We will additionally use TIM to cluster neurodevelopmental
disorder genes by similarity of transcriptomic effects, to identify genes with putatively convergent mechanism.
Broadly, our model will allow conceptually novel insight to be extracted from differential expression experiments,
with applicability to any biological perturbation of interest.