Learning about the evolution of structural variations from genomic and transcriptomic data - PROJECT SUMMARY Structural variations are key drivers of both evolutionary adaptation and human disease. My group develops and applies computational and statistical approaches for understanding the evolution of structural variations from patterns in their genomic and transcriptomic data. During the past few years, our studies have focused primarily on gene duplication, which represents the most common type of structural variation observed in nature. In particular, we investigated the origins of evolutionary innovation after gene duplication, a problem of long- standing interest in the evolutionary genomics community. To answer this question, we designed the first method for classifying evolutionary outcomes of duplicate genes from phylogenetic comparisons of their gene expression profiles. By applying this decision tree method to multi-tissue gene expression data, we were able to classify evolutionary outcomes of duplicate genes in Drosophila, mammals, and grasses. These studies revealed frequent tissue-specific expression divergence after duplication, as well as sequence and expression differences within and among taxa that are consistent with natural selection. In a follow-up population-genomic analysis, we demonstrated that natural selection indeed plays an important role in the evolutionary outcomes of young duplicate genes in Drosophila. Later, we developed analogous decision tree classifiers for two additional types of structural variations: gene deletion and translocation. Applications of our methods to sequence and expression data from multiple tissues and developmental stages in Drosophila uncovered rapid divergence concordant with adaptation, suggesting that natural selection shapes the evolutionary trajectories of structural variations generated by deletion and translocation as well. However, our recent analyses revealed that there are many limitations of these decision tree methods, including sensitivity to gene expression stochasticity, lack of statistical support, and inability to predict parameters driving the evolution of structural variations. Thus, during the next five years, my group will develop a suite of tailored model-based statistical and machine learning approaches for classifying the evolutionary outcomes and predicting the evolutionary parameters of structural variations arising from duplication, deletion, inversion, and translocation events. Our preliminary studies indicate that these techniques will be much more powerful and accurate than previous approaches, and will therefore compose major advancements in evolutionary investigations of structural variations. In addition to implementing our methods in open source software packages, we will apply them to assay the evolutionary implications of different types of structural variations in humans and several other animal and plant taxa. Comparisons will be made among different types of structural variations, their evolutionary outcomes, and taxonomic groups. The major goal of these studies will be to ascertain the general rules by which different types of structural variation contribute to evolutionary innovation. Together, these studies will shed light on how gene duplication, deletion, inversion, and translocation work in concert to generate a diversity of complex adaptations across the tree of life.