Computational and Statistical Methods for Genetic Association Studies of Disease Course Over Time - PROJECT SUMMARY ABSTRACT Genome-wide association studies (GWAS) have been remarkably successful at mapping genomic loci for complex human diseases. However, most studies use case-control designs based on disease occurrence at a specific time point, such as diagnosis. overlooking genetic factors that influence the course of disease over time, including initiation, onset, progression, severity, and therapeutic response. Time-to-event (TTE) phenotypes capture both the occurrence and timing of disease activity. GWAS of TTE phenotypes can identify genetic variants associated with disease onset and progression, providing insights for early prevention and therapies aimed at halting disease progression. Large biobanks like UK Biobank, All of Us, and FinnGen, which combine longitudinal electronic health records (EHR) with genomic data, offer unprecedented opportunities to study genetics of TTE phenotypes. However, limitations remain in current computational and statistical tools, especially for rare variants and admixed populations. Additionally, biobank heterogeneity— differences in sampling strategies, follow-up times, and baseline hazards—poses challenges for meta-analysis of GWAS on TTE phenotypes. This project aims to overcome these barriers by developing innovative computational tools and statistical methods for GWAS of TTE phenotypes in large biobanks and cohorts. Aim 1 focuses on creating scalable methods for rare variant association tests on TTE endpoints, accounting for sample relatedness, population substructure, and high censoring. Aim 2 develops novel approaches to include admixed individuals in GWAS of disease onset and progression over time, enhancing inclusivity and reducing disparities in genetic discovery. Aim 3 improves GWAS and meta-analysis methodologies for TTE endpoints by addressing biases like left censoring, sampling bias, and collider bias, enabling robust integration of data across biobanks. The proposed methods will be evaluated through extensive simulation studies and applied to multiple biobanks. Successful completion of this project will provide new critical tools to advance our understanding of the genetic basis of complex diseases, reduce health disparities, and fully harness the potential of biobank resources for uncovering genetic factors influencing disease onset, progression, and treatment response. These tools and results will be made available as open-source software and public datasets to ensure broad accessibility to the research community. Additionally, we will continue to develop, distribute, and support open- source software packages for the proposed methods.