The Architecture of Missing and Archaic Variation in Human Population Genomic Data - Project Summary Modern human genomes are mosaics of variation from numerous archaic non-human hominins, often termed “ghost” populations. However, our understanding of the evolutionary history of “ghost” variation is still developing. Importantly, computational methods to address missing “ghost” variation are still nascent, and not accounting for the presence of “ghosts” often leads to erroneous inference. Here I propose a series of programmatic developments to address inference of evolutionary history from modern human genomes, while accounting for gene flow from archaic “ghosts”. In AIM 1, I propose to develop a parallelized statistical framework for estimating population genetic structure from multi-allelic, multi-locus genomic data that incorporates sequencing and imputation errors of data considered missing due to gene flow from archaic “ghost” populations into a maximum likelihood based statistical framework. This method will be incorporated into a computationally efficient program called p-MULTICLUST, a multi-threaded, parallelized tool which extends the popular “admixture” model incorporated in tools like STRUCTURE and ADMIXTURE to account for missing multi- allelic human genomic data. AIM 2 will involve a two-pronged approach to estimate evolutionary history and population structure in the presence of gene flow from an archaic “ghost” under the Isolation with Migration (IM) model. We will (a) develop extensions to the IMa3/IMa2p suite of tools to incorporate joint estimation of population structure and demographic history from genomic data, and (b) train undergraduate students in developing simulation models for the stdpopsim consortium under two important models of human history – (1) archaic “ghost” gene flow in native Africans, and (2) multiple-epochs of admixture into Asians/Oceanians. In AIM 3, I propose to quantify the selection landscape of “ghost” variation across diverse human genomes due to ancestral gene flow from now extinct “ghost” populations. In this aim, we will focus on (a) improvements to the MigSelect program to quantify linked selection effects due to gene flow from “ghost” populations under the IM model, and (b) a larger, more encompassing study of functional genomic variation across diverse human populations including high-quality genomes from Africa, supplemented with more complete Neanderthal, and other non-human hominin genomes which will help us delineate patterns of human evolutionary history, and understand the functional consequences of archaic gene flow. These discoveries also have direct consequences for understanding modern human ancestry, and disease allele evolution. Importantly, this R15 will train numerous underrepresented Undergraduate and Graduate students in genomics and bioinformatics, towards careers in the biomedical and data sciences.