Project Summary/Abstract
Due to the ease of handling and inexpensive storage, Formalin-Fixed Paraffin-Embedded (FFPE) tissues are
the most widely available source of tissue material for which long-term clinical follow-up data are recorded. The
ubiquity of FFPE tissue specimens has made them an invaluable resource in biomedical research, with great
potential for predictive and prognostic biomarker discovery. However, the quality of RNA extracted from FFPE
tissues is generally poor due to chemical modifications and continued degradation over time. Consequently,
assays using microarray or quantitative polymerase chain reaction (qPCR) often have limited reproducibility and
sensitivity when measuring gene expression from such samples.
In order to exploit the vast collection of FFPE samples, substantial effort has been devoted to development
and/or validation of advanced technologies that can reliably probe their gene expression levels. For medium-
throughput profiling, NanoString nCounter is frequently used with FFPE samples, as the nCounter system can
accurately measure gene expression even when the target RNA is degraded. For high-throughput profiling, RNA
sequencing is in common use. Recent studies have shown that for a wide variety of human tumor tissues (e.g.,
bladder, colon, prostate and renal carcinoma), RNA-seq can be used to measure mRNA of sufficient quality
extracted from FFPE tissues to provide biologically relevant transcriptome analysis.
With the above advances, the use of FFPE specimens in cancer research has been growing fast, and analysis
of FFPE gene expression data has become increasingly important. A crucial step when analyzing this type
of data is normalization. Existing methods were all designed and validated using fresh-frozen (FF) or similar-
type samples because using such samples has been a standard in most molecular biological analysis. FFPE
expression data have very distinct technology-specific characteristics which present many statistical challenges.
All these give rise to a pressing need for novel and rigorous statistical approaches to normalization that allow for
modeling key characteristics of FFPE expression data to remove all estimable biases, in order to enhance the
power and reproducibility of transcriptome analysis, and ultimately to promote utilization of largely existing FFPE
specimens in biomedical research. To meet the need, we propose to accomplish the following specific aims.
In Aim 1, we will develop rigorous yet flexible methods to normalize FFPE expression data from experiments
using NanoString nCounter, the most important medium-throughput technology compatible with FFPE samples.
In Aim 2, we will develop robust and efficient methods to normalize FFPE data for high-throughput gene expres-
sion analysis using RNA-seq. In Aim 3, we will collaborate with leading cancer researchers to apply and refine
the statistical methods, to facilitate translation from biomarker discovery to clinical practice. In Aim 4, we will
test the proposed methods using extensive simulation and multiple benchmark data sets, and develop free and
open-source software for dissemination to the scientific community.