Novel bioinformatics methods for integrative detection of structural variants from long-read sequencing - Structural variants (SVs) are the largest source of variation in the human genome and are frequently associated with disease phenotypes. Thus, the comprehensive characterization of SVs is essential for understanding human genome structure and function, and the role of SVs in disease. While long read sequencing technologies improve the alignments to the human reference genome that are used for identifying SVs, current widely used SV calling methods are limited due to relying on alignment evidence, which is less reliable for larger SVs and in highly repetitive regions of the human genome. The goal of this proposal is to develop an improved computational method that integrates additional information, including copy number predictions based on coverage and single-nucleotide variant (SNV) allele frequencies, to improve sensitivity for a wider range of SVs. Additionally, I aim to improve SV detection by leveraging additional genomics technologies such as optical mapping. Finally, I aim to include support for the latest human pangenome reference representations, which should improve alignments and enable more comprehensive SV characterization. This proposal has important implications for identifying structural variants with disease relevance that have been understudied due to limitations in current approaches. I will also develop a machine-learning-based model to assign SV confidence scores based on alignment and genomic context evidence. Confidence scores will be used to filter likely false positives and improve precision. Finally, I will be incorporating support for pangenome graph alignments: Unlike conventional linear human reference genomes such as GRCh38, pangenomes can represent multiple complete haplotypes simultaneously in a single graph representation, which enables identification of structural variants for regions of the human genome that may be missing or incomplete in GRCh38. In summary, in this project I will develop a novel SV calling method capable of integrating evidence beyond alignments, as well as from multiple genomics technologies, to identify SVs in the human genome often missed by current methods. This method will have important implications for the identification and characterization of SVs with clinical relevance. Through this research training plan, I will 1) acquire advanced expertise in bioinformatics and human genomics, 2) refine my oral and written communication skills, and 3) establish a foundation for a scientific career dedicated to the study and dissemination of knowledge on human genome variation.