Computational toolkit for robust detection of genomic variation in human leukocyte antigen genes - Genetic variations in HLA genes in humans are associated with over 200 diseases, and large-scale genomic sequencing projects are now generating data on HLA genes from millions of individuals. Despite their immense clinical relevance, next-generation sequencing based computational inference of short (SNP or insertion/deletion) in HLA genes is difficult because of their highly polymorphic nature, inter-HLA gene similarity, and strong linkage disequilibrium. Existing tools for HLA variant detection are error-prone, not designed for scalability, not interoperable across sequencing formats, and the developers have no formal mechanisms to provide support after publication. The objective of this application is to develop highly accurate, robust, scalable, and deployment-ready pipelines for identifying germline and somatic variants in HLA genes through integration and enhancement of our previously developed tools. To achieve this goal, we aim to (1) Develop tools for detecting short germline and somatic HLA variants by enhancing our Polysolver tool for allele inference across all HLA genes, and further developing the Mutect3 pipeline for mutation detection; (2) Establish a reference dataset for benchmarking performance of HLA variant detection tools; and (3) Use the widely used GATK4 framework and Workflow Definition Language (WDL) to create and disseminate robust, scalable and well-supported HLA variant detection pipelines. This will be the first such comprehensive HLA analysis toolkit, which we expect will be widely used by both individual researchers and sequencing consortia in multiple disease communities. Mutect3, which internally employs a “deep sets” architecture, will be the first mutation detection tool capable of jointly calling germline and somatic short variants and handling multiple references at a genomic locus. If successful, this project will unlock the hitherto untapped potential of rapidly growing sequencing datasets by enabling discovery of new HLA alleles, variations in known HLA alleles, and novel HLA-disease associations which can directly be harnessed for personalized preventive and therapeutic applications.