Inference and application of graphs for genomic data - Project Summary/Abstract The genealogical structure for whole genomes can be described through Ancestral Recombination Graphs (ARGs). ARGs are summaries that contain all of the information in genomic sequencing data about processes such as demographic history, selection, and recombination. The primary objective of this research is to develop a suite of computational tools that use posterior sampling of ARGs in order to provide methods for testing hypotheses about the distribution and evolution of genomic variation, and in general, to provide improved quantification of mutation, recombination, selection, and demographic history. These methods will be full likelihood/Bayesian methods that can take advantage of the rich population genetic information in whole-genome sequencing data. We expect the methods to scale up to allow posterior sampling of ARGs from a coalescence prior for many hundreds, or perhaps thousands, of genomes. We will make an open-source, user-friendly, flexible, and integrated program available to other researchers that will allow them to test a wide range of demographic and evolutionary hypotheses on their own data. We will also develop associated methods for ancestral inference of past migration and the geographic location of ancestors of an individual. Additionally, we will develop improved methods for quantifying spatiotemporal patterns of natural selection affecting the genome. We will apply the methods to modern and ancient DNA to test hypotheses about the relative contribution of demographic processes and natural selection for shaping the landscape of phenotypic variation in Europe, including disease susceptibility. We will also use the methods to revisit an ongoing controversy of the relative importance of changing mutation patterns and changing generation times in shaping the pattern of human mutation variation. Finally, we will use the methods to develop more accurate human recombination maps and to test hypotheses about recombination rate variation. In addition to this, we will develop new Bayesian Markov Chain Monte Carlo methods for estimating Developmental Lineage Trees (DLTs) using mitochondrial heteroplasmies and single cell DNA sequencing. We will also develop methods that can jointly analyze single cell RNA sequencing and DNA sequencing data to make joint models of DLTs with associated transitions in expression state. Such explicit temporal models of cell differentiation will be central in the translational aspects of cell specific analyses, in particular for predicting the effects of various forms of medical intervention.