New methods for studying thousands of complete human and vertebrate genomes - PROJECT SUMMARY / ABSTRACT We propose to advance comparative genomics through three interconnected aims. In Aim 1, we will scale genome alignment and analysis capabilities to accommodate the rapidly growing number of available vertebrate genomes, projected to reach ~10,000 species within four years. We propose developing a highly scalable, automated pipeline for species-tree construction from raw genome assemblies, reworking the genome-based HAL format, and introducing a new column-based alignment format (TAF) to facilitate efficient representation and analysis of large-scale alignments. To demonstrate these improvements, we will create and share the deepest ever vertebrate alignments, focusing on predictions of evolutionary selection. In Aim 2, we address the challenge of aligning telomere-to-telomere (T2T) human genome assemblies. We will develop novel repeat-aware alignment algorithms and graph models to enable a more complete alignment, starting with human centromeres and extending to other heterochromatic sequence. We will integrate these methods into our pangenome construction process to facilitate T2T alignment within the human pangenome. Aim 3 focuses on disentangling the genetics of the 1q21.1 region, which is associated with various neurodevelopmental disorders, as an exemplar of a difficult segmental duplication. We will analyze T2T ape and human genomes to create a comprehensive 1q21 pangenome, develop efficient methods for high-resolution genome reconstruction of patient-derived cell lines with 1q21.1 copy number variations, and test the functional consequences of identified genetic alterations using hiPSC-derived cerebral cortex organoids. By combining advanced genomic analysis, efficient sequencing protocols, and functional studies, we aim to significantly advance our understanding of genome evolution, improve our ability to analyze complex genomic regions, and potentially lead to improved diagnostics and therapeutic strategies for neurodevelopmental disorders associated with the 1q21.1 region. Together, these advancements will empower the community to conduct more comprehensive comparative genomic analyses, leading to a deeper understanding of genome evolution across vertebrates and humans and its implications for human health.