Project Summary/Abstract
Serious brain disorders like schizophrenia, bipolar disorder, major depression, autism spectrum
disorder, and Alzheimer’s disease are debilitating illnesses that are substantial burdens on both the families of
affected individuals and the public health. While they all have high degrees of heritability, the etiology
underlying these disorders in the majority of patients has been difficult to characterize. The strongest clues for
the etiological underpinnings of these disorders, particularly neuropsychiatric and neurodevelopmental, come
from recent genetic studies which have identified hundreds of common loci that each contribute to small effects
of risk, but the mechanisms guiding any individual risk locus remain largely unknown. These common variants
are therefore hypothesized to manifest at the gene pathway- and network-level, but there has been substantial
variability in the pathways associated with the illness based on the genetic association results. Many groups,
including our own, have therefore utilized postmortem human brain tissue to better understand the molecular
correlates of the both genetic and non-genetic effects of these disorders, as gene expression levels may better
illuminate mechanisms of risk.
However, in this proposal we point out a damaging and often overlooked issue related to confounding
effects of RNA quality in comparing postmortem tissue between patients and controls – we have identified
strong confounding effects of RNA quality found in the majority of published, and our own, datasets. We first
describe these widespread RNA quality effects, demonstrate that existing statistical approaches do not remove
this confounding, and show these RNA quality effects drive inference in co-expression and network analyses –
using both simulated and real data, we identify hundreds of false positive network edges while discovering only
few true edges. In this application, to better understand the molecular etiology of these debilitating disorders,
we propose a framework to accurately model RNA quality in gene expression datasets based on molecular
degradation experiments across the human brain. This framework, called “quality surrogate variable analysis”,
will be applied to better identify molecular signatures at the gene and network level for debilitating brain
disorders to improve replication and interpretability from these large publicly available datasets.
Gene networks resulting from our RNA quality-corrected framework will be interrogated for biological
functionality and clinical relevance using pre-defined gene sets. These results can illuminate potentially novel
biological associations underlying serious mental illness. We hypothesize that removing the biases induced by
RNA quality will result in the strongest enrichment with these gene sets at the gene and network levels.
Correctly modeling potential RNA quality effects in postmortem gene expression data will be an important tool
in the statistical analysis of gene and network level analyses to improve concordance and biological inference
across rich datasets that can potentially lead to novel therapeutic targets to treat these disorders.