Project Summary/Abstract
Understanding spatial genetic variation is tremendously valuable for medical genetics, understanding human
population history, identifying the geographic origin of samples, and management of disease vectors. However
while organisms in natural populations disperse a limited distance from their birth location, most population
genetic models do not account for such genetic isolation over space, and instead treat populations as
composed of discrete demes. Moreover, dispersal rate and population density often vary across the landscape
due to heterogeneous environmental conditions, population structure, or simply geography. These modeling
violations have real world implications, for example, in correcting genome-wide association for hidden
population structure (Berg et al., 2019; Sohail et al., 2019; Battey et al. 2020a; Zaidi and Mathieson, 2020). In
this proposal we aim to develop a genomics toolset for inferring spatial population genetic parameters through
the use of deep learning.
One strategy for dealing with geo-referenced genomic data is to train a deep neural network (DNN) to
identify useful information in the data in an automated fashion. DNNs can be trained on simulated data, which
bypasses the need to obtain empirical data for training. In this proposal we present the first use of DNNs for
inference of spatial population genetic parameters. The proposal has three Specific Aims: 1) we will develop a
method that uses DNNs to estimate spatially varying dispersal rates from geo-referenced DNA samples, 2) we
will modify our deep learning tool to infer the additional demographic parameter of population density across
space and 3) lastly, we will apply our method to infer dispersal rate and density in two important empirical
systems for which geo-referenced genomic data are available: Anopheles gambiae, and in humans. Our
approach for inferring spatial demographic processes will directly inform empirical applications such as
genome wide association studies and disease vector control, as well as lay the groundwork for other spatial
population genetic inquiries.
The postdoctoral fellow will receive rigorous training in cutting edge computational techniques relevant
to deep learning and statistical and quantitative methods in spatial population genetics. The sponsoring labs
have abundant computational resources to support the proposed research. In addition, the University of
Oregon houses an NSF-supported center for machine learning which will provide incredible opportunities for
the fellow. Additional training will include preparation of grant proposals and first author manuscripts,
presenting at conferences, advising undergraduate thesis projects, and mentored teaching in-classroom.