Project Summary / Abstract
Amyotrophic Lateral Sclerosis (ALS) is a fatal neurodegenerative disorder characterized by the progressive
loss of brain and spinal cord motor neurons (MNs). Approximately 50% of ALS patients display Frontotemporal
Dementia (“FTD”)-associated symptoms; reciprocally, approximately 40% of FTD patients display motor
neuron deficits with approximately 15% developing ALS. Given the overlap between ALS and FTD, the two
conditions are considered to represent a disease spectrum (ALS/FTD). In recent years, we and others have
successfully identified several genetic causes of ALS/FTD which has greatly contributed to our understanding
of disease pathogenesis. Unfortunately, one-third of the underlying genetic causes of familial and ~90% of
sporadic forms of ALS/FTD still remain unexplained. As such, there is a dire need to identify additional genetic
factors contributing to ALS/FTD. Recent genetic studies have increasingly shown a strong role of rare variants,
non-coding regions and structural variants (SVs) in human diseases. Unfortunately, such studies require very
large cohorts of harmonized WGS data sets from cases and controls. To overcome this unmet need, we have
founded the ALS/FTD Compute project to combine the raw ALS/FTD WGS data from every major sequencing
effort in the United States. Further, ALS/FTD Compute has been accepted into NHGRI’s AnVIL program
providing an excellent platform for hosting ALS/FTD Compute and providing technical support. To date, we
have obtained the raw WGS data from over ~10,000 ALS/FTD cases and have identified ~35,000 control raw
WGS data. The objective of this application is to data harmonize our WGS cohort followed by secondary
analyses to identify novel genetic elements contributing to ALS/FTD. Toward these goals, we have designed
the following specific aims: R61 Phase (1) Harmonize the Phenotypic and WGS Cohort Data and Identify
SNVs/Indels. Raw WGS data will be subject to re-alignment and joint genotyping calling to identify and
annotate SNVs/indels on the AnVIL platform. (2) Identification of Additional Variants Within the
Harmonized Cohort. The harmonized data from Aim #1 will be used to identify/genotype SVs and repeat
expansions (REs). (3) Evaluation of Quality Control Metrics from the Data Harmonization/Variant Calling.
QC metrics from Aim #1/#2 will be evaluated to ensure high quality harmonization and variant calling. R33
Phase (4) Discovery of Novel SNVs/Indels Associated with ALS/FTD. The harmonized data set will be
subject to several analyses including GWAS, rare variant association testing, heritability estimates, and genetic
correlation tests. (5) Discovery of Novel SVs/REs Associated with ALS/FTD. Analyses of association with
common and rare SVs will be performed, including specialized test for multi-allelic CNVs and expansion of
short tandem repeats. (6) Discovery of Novel Genetic Modifiers of ALS/FTD. Analyses will be performed to
discover variants associated with clinical disease subtypes and quantitative outcome measures, and predictive
models will be built that model the effects of interactions between genes.