Comprehensive Harmonization and Analysis of Case/Control Whole Genome Sequencing Data from the ALS/FTD Compute Project - Project Summary / Abstract Amyotrophic Lateral Sclerosis (ALS) is a fatal neurodegenerative disorder characterized by the progressive loss of brain and spinal cord motor neurons (MNs). Approximately 50% of ALS patients display Frontotemporal Dementia (“FTD”)-associated symptoms; reciprocally, approximately 40% of FTD patients display motor neuron deficits with approximately 15% developing ALS. Given the overlap between ALS and FTD, the two conditions are considered to represent a disease spectrum (ALS/FTD). In recent years, we and others have successfully identified several genetic causes of ALS/FTD which has greatly contributed to our understanding of disease pathogenesis. Unfortunately, one-third of the underlying genetic causes of familial and ~90% of sporadic forms of ALS/FTD still remain unexplained. As such, there is a dire need to identify additional genetic factors contributing to ALS/FTD. Recent genetic studies have increasingly shown a strong role of rare variants, non-coding regions and structural variants (SVs) in human diseases. Unfortunately, such studies require very large cohorts of harmonized WGS data sets from cases and controls. To overcome this unmet need, we have founded the ALS/FTD Compute project to combine the raw ALS/FTD WGS data from every major sequencing effort in the United States. Further, ALS/FTD Compute has been accepted into NHGRI’s AnVIL program providing an excellent platform for hosting ALS/FTD Compute and providing technical support. To date, we have obtained the raw WGS data from over ~10,000 ALS/FTD cases and have identified ~35,000 control raw WGS data. The objective of this application is to data harmonize our WGS cohort followed by secondary analyses to identify novel genetic elements contributing to ALS/FTD. Toward these goals, we have designed the following specific aims: R61 Phase (1) Harmonize the Phenotypic and WGS Cohort Data and Identify SNVs/Indels. Raw WGS data will be subject to re-alignment and joint genotyping calling to identify and annotate SNVs/indels on the AnVIL platform. (2) Identification of Additional Variants Within the Harmonized Cohort. The harmonized data from Aim #1 will be used to identify/genotype SVs and repeat expansions (REs). (3) Evaluation of Quality Control Metrics from the Data Harmonization/Variant Calling. QC metrics from Aim #1/#2 will be evaluated to ensure high quality harmonization and variant calling. R33 Phase (4) Discovery of Novel SNVs/Indels Associated with ALS/FTD. The harmonized data set will be subject to several analyses including GWAS, rare variant association testing, heritability estimates, and genetic correlation tests. (5) Discovery of Novel SVs/REs Associated with ALS/FTD. Analyses of association with common and rare SVs will be performed, including specialized test for multi-allelic CNVs and expansion of short tandem repeats. (6) Discovery of Novel Genetic Modifiers of ALS/FTD. Analyses will be performed to discover variants associated with clinical disease subtypes and quantitative outcome measures, and predictive models will be built that model the effects of interactions between genes.