Project Summary The number of AD patients is gradually increasing every year, and the economic burden of
health care of AD patients, estimated at $335 billion in 2021, is predicted to triple by 2050. In the interest of public
health and the economy, understanding AD genetics and finding effective AD prevention and treatment are
important. Numerous studies have suggested that AD is a complicated genetic disorder, often involving genomic
structural changes and regulation. Thus, there is a strong need to investigate not only regular genes, proteins,
and their regulations but also the other genetic components in AD. Retrotransposons (RTEs) are DNA
sequences that copy themselves and insert their copies into the genome. There has been some interest in
studying retrotransposons in AD research. For example, it is known that chromatin relaxation mediated by Tau
protein accumulation may overly activate the retrotransposons. This massive activation may provoke an innate
immune response and damage the genome, which can result in neurodegeneration. Moreover, a study showed
that antiviral drugs could suppress activation of RTEs in AD by inhibiting their reverse transcriptase, and the
suppression results in the prevention of neurodegeneration. These studies suggested that investigating the
features and roles of retrotransposons in AD will provide an additional and important way to understand the
regulations of RTEs in AD pathogenesis. However, molecular characteristics of the RTEs in AD, such as cell
type-/sex-specificity, are still unknown. Characterizing RTEs requires generating large-scale RTE expression
datasets. Such data has not been available publicly, although the AD research community has made tremendous
efforts to generate large-scale postmortem AD transcriptome data, including bulk RNA-seq of ~2,000 subjects
and single-cell nuclei RNA-seq of ~260,000 cells. Therefore, we propose two specific aims to perform the first
systematic study of RTE by constructing an RTE atlas resource for AD study: Aim 1. To generate large-scale
RTE expression datasets by mining and processing public AD transcriptome datasets. We will extend our
SalmonTE algorithm to mine RTE expressions from AD transcriptome datasets at both tissue-level and single-
cell resolution. Aim 2. To generate AD RTE atlas resources by characterizing RTEs and AD patients using
statistical and machine learning methods. We will expand our in-house computational methods to calculate
context-specificity (e.g., brain region, cell type, and sex) of each RTE in human AD brains. We will also develop
an unsupervised graph neural network using RTE expression and multi-omics data to characterize AD patients.
In the end, we will create an atlas website to share our findings with the AD research community. Successful
completion of this project will provide 1) novel computational methods to rigorously characterize RTEs in AD, 2)
identification of context-specific RTEs in AD and characterization of AD patients using RTE expression, and 3)
a well-annotated AD RTE atlas to deepen our knowledge in the molecular basis of AD.