Secure Outsourcing of Genotype Imputation for Privacy-aware Genomic Analysis (RO1HE21) - Project Summary/Abstract Population scale genome sequencing projects such as The 1000 Genomes, TOPMed, and All of US Program will generate genotype data for millions of individuals. This number increases substantially if the recreational usage of genetic data from genealogy companies, such as 23andme, is accounted for. Sharing and analyzing this data create monumental challenges for the privacy of participants. Recently the hackers began targeting genealogy databases such as the hacking of GEDmatch in 2020. Due to the large scale and high dimensions of genomic data, analysis workflows require large computational resources. This incentivizes companies, hospitals, and research labs to use outsourcing services from third parties to analyze and interpret genomic data such that the genomic data is stored on untrusted 3rd party servers. In this proposal, we focus on the secure outsourcing of genotype imputation, which is a computationally intensive and central task in large-scale genotype analysis. Genotype imputation is the prediction of missing or low-quality variant genotypes using a small set of variant genotypes that are measured using, for example, genotyping arrays, low-coverage, or targeted sequencing. It is a vital step for analyzing raw genomic data for quality control, predicting missing genotypes, variant phasing, and fine mapping of associations to identify causal variants. When combined with sparse arrays, imputation can greatly reduce the cost of population-scale and family-based genotyping. For example, the All of Us Project will rely on a custom genotyping array, Infinium Global Diversity Panel, to decrease the cost of genotyping millions of individuals. Imputation methods will be of vital importance for this task. To perform these enormous tasks, the imputation methods require large computational resources and are often outsourced to 3rd party “imputation servers”. These servers will soon process thousands, If not millions, of genomes and store sensitive genomic data. Unfortunately, these services are not strictly secure neither from unauthorized hackers nor from curious users who have authorized access to the servers. There is an urgent need for privacy-aware imputation methods that can be deployed on even untrusted 3rd party services such as high-performance cloud platforms so that outsourcing can be safely performed at population scale. Our proposed methods use state-of-the-art homomorphic encryption that provides perfect genomic data security while in transit, at rest, and even while imputation is being performed. We design new and efficient “encryption- amenable” methods and frameworks for protecting the study participants and their families, and for protecting the population panels, i.e., underrepresented populations. Our benchmarks show that secure methods achieve high imputation accuracy even on commodity hardware with comparable time as the state-of-the-art non-secure methods. Proposed methods can provide practical population-scale genomic privacy and security for imputation and association studies.