Project Summary
The revolution in genome sequencing technologies over the past 15 years has created an explosion of
population genomic data but has left in its wake a gap in our ability to make sense of data at this scale. This is
in part because traditional population genetic models do not reflect the genomic and geographic processes
producing the tremendously diverse data now being collected. To capitalize on this flood of information we
need new methods and modes of analysis.
In recent years our group has made great strides in using supervised machine learning for population
genomic analysis (reviewed in part in Schrider and Kern 2018). In particular, we have pioneered the
development and application of deep learning techniques for a wide variety of tasks including detecting
selection (Kern and Schrider 2018, Xue et al., 2021), localizing introgression tracts in the genome (Schrider et
al. 2018), characterizing the landscape of recombination (Adrion et al. 2020), predicting geographic origin
(Battey et al. 2020), as well as visualizing population genetic data (Battey et al. 2021). A particular focus of our
efforts during the last funding period has been understanding evolution of Anopheles gambiae in response to
vector control efforts underway in sub-Saharan Africa, thus we have been and continue to develop statistical
methods with these important data in mind.
Our work on Anopheles has reinforced in us the importance of studying spatial variation, particular in
the context of adaption. In this proposal we build upon ideas we have been developing during the previous
funding period and discuss three facets of our ongoing research program. The proposal has three sections: 1)
we will continue our work on spatial population genetics, and propose to develop methods for inferring
dispersal parameters directly from population genomic data, as well as to improve our understanding of the
ways in which spatial structure can impact GWAS and related techniques. 2) To develop methods to further
characterize the population genomics of adaptation. In this section we are particularly interested in developing
deep learning methods that account for the geographic spread of an allele relative to its surrounding pattern of
genomic variation to discover beneficial alleles. In addition we will develop methods for discovering selection
that build upon recent improvements in our ability to infer population-scale genealogies (Kelleher et al., 2019,
Speidel et al., 2019). Finally, 3) we propose new avenues of development of a community resource project
which our group has been leading, the stdpopsim project, which aims to provide an open source, highly
reproducible and accessible method for doing population genetic simulation in a number of common study
systems. We outline plans towards more realistic simulation of genomes under selection, with a particular
focus on implementing previously published estimates of selective parameters. Moreover we will use the
stdpopsim library to benchmark commonly used methods in demographic and selective inference