Computational Methods to Characterize Alternative Splicing and Genetic Determinants from Heterogeneous Sequence Data - SUMMARY Our project will develop powerful and user-friendly computational methods to characterize alternative splicing variation in human physiology and disease, at the level of individuals and population. Alternative splicing (AS) is a complex and widespread gene regulatory mechanism in eukaryotic species, with important implications for development and disease. As efforts to determine the role of AS in disease are accelerating, genetic determinants of splicing and splicing related disease are starting to be broadly investigated. Two emerging developments are driving our research goals: increasingly large patient cohorts and population studies generating complex sequencing and clinical data, and the rise in use of third generation sequencing technologies, including Oxford Nanopore and PacBio long reads, and technologies for single cell transcriptomics. Research on these fronts demands new specialized models and tools, however, such tools are scarce at the moment. We will develop a general framework and specialized tools to detect and characterize splicing variation from RNA sequencing data produced by diverse sequencing technologies, to model AS and its regulatory ‘code’ from heterogeneous omics data, and to predict genetic determinants of splicing in disease and in the population. Our first broad research effort will develop a statistical framework along with novel technology-specific models and tools for differential splicing analysis from short RNA-seq, long read and single cell RNA sequencing data, accounting for confounding factors such as age, sex, and clinical and metabolic measurements. Our second research effort will develop complex and accurate deep learning models of AS regulation, in the context of RNA processing pathways and by combining RNA sequencing and other types of omics data. Cross-cutting, we will equip our methods to detect the effects of genetic variation on AS and disease. We will take an innovative approach that focuses on introns rather than full length or locally reconstructed transcripts, which capture alternative splicing extensively, allow discovery of novel variants, and drastically reduce ‘noise’ from biological and sequencing artifacts and from assembly errors. We will validate our methods in silico and in the lab with minigene experiments with aid from collaborators, disseminate them to the community at large through the third party repositories GitHub and Anaconda, and popularize them through a new online Coursera course on practical transcriptomic methods. Our tools will allow biomedical investigators to characterize changes in AS associated with disease, along with the effects of genotype, to identify potential biomarkers or treatment targets, and thus will help advance research to elucidate the role of AS in human health and disease and its translation to precision medicine.