Project Summary/Abstract
Single-cell RNA sequencing (scRNA-seq) is currently at the forefront of biotechnological innovation. scRNA-seq
experiments enable gene expression measurement at a single-cell resolution, and provide an opportunity to
characterize the molecular signatures of diverse cell types, states, and structures in tissue development and
disease progression. However, it remains a substantive challenge to construct a comprehensive view of single-
cell transcriptomes in health and disease, due to the knowledge gap in properly modeling the high-dimensional,
sparse, and noisy scRNA-seq data. While the development of new data science methods, including our recent
work, has facilitated the design and analysis in scRNA-seq studies to identify and annotate distinct cell
populations, there is a critical need for computational methods that can accurately evaluate biological
hypotheses for these diverse cell populations. To address this knowledge gap and critical need and thereby
enable a systematic understanding of transcriptional and post-transcriptional mechanisms across biological
scales (from cells to genes to RNA molecules), the objective of our MIRA research program is to develop novel
statistical methods and bioinformatics software for multiscale analysis of single-cell transcriptomes. We will
pursue three parallel but complementary research directions: (1) to develop novel statistical methods for
quantifying and comparing gene regulatory associations from single-cell gene expression data; (2) to develop
the first statistically principled methods for identifying, quantifying, and comparing alternative polyadenylation
usage from 3’-end scRNA-seq data; and (3) to develop a novel statistical model for jointly analyzing and
comparing scRNA-seq data from heterogeneous biological samples, such as multiple patients, developmental
stages, or related species. The proposed research will be built on the foundations of our recent studies in
developing interpretable statistical methods and user-friendly software for quantifying, denoising, integrating,
and comparing genomic data at various biological scales. Throughout the program, we will work closely with
experimental biologists at Rutgers Cancer Institute of New Jersey and Wistar Institute, and use our proposed
methods to identify and study transcriptional mechanisms in intestinal biology, neurobiology, and cancer biology.
Together, this concerted effort will provide efficient and broadly applicable statistical and bioinformatics tools for
generating substantial insights into identifying key cells, pathways, gene interactions, and RNA transcripts
associated with various biological contexts, including human disease. The proposed program also aligns with
my team’s long-term goal to develop a statistically principled understanding of transcriptional and post-
transcriptional regulation in single cells, thus improving our ability to define, interpret, and predict cellular
commitment and functionality in health and disease.