PROJECT SUMMARY
Structural variations are key drivers of both evolutionary adaptation and human disease. My group develops and
applies computational and statistical approaches for understanding the evolution of structural variations from
patterns in their genomic and transcriptomic data. During the past few years, our studies have focused primarily
on gene duplication, which represents the most common type of structural variation observed in nature. In
particular, we investigated the origins of evolutionary innovation after gene duplication, a problem of long-
standing interest in the evolutionary genomics community. To answer this question, we designed the first method
for classifying evolutionary outcomes of duplicate genes from phylogenetic comparisons of their gene expression
profiles. By applying this decision tree method to multi-tissue gene expression data, we were able to classify
evolutionary outcomes of duplicate genes in Drosophila, mammals, and grasses. These studies revealed
frequent tissue-specific expression divergence after duplication, as well as sequence and expression differences
within and among taxa that are consistent with natural selection. In a follow-up population-genomic analysis, we
demonstrated that natural selection indeed plays an important role in the evolutionary outcomes of young
duplicate genes in Drosophila. Later, we developed analogous decision tree classifiers for two additional types
of structural variations: gene deletion and translocation. Applications of our methods to sequence and expression
data from multiple tissues and developmental stages in Drosophila uncovered rapid divergence concordant with
adaptation, suggesting that natural selection shapes the evolutionary trajectories of structural variations
generated by deletion and translocation as well. However, our recent analyses revealed that there are many
limitations of these decision tree methods, including sensitivity to gene expression stochasticity, lack of statistical
support, and inability to predict parameters driving the evolution of structural variations. Thus, during the next
five years, my group will develop a suite of tailored model-based statistical and machine learning approaches for
classifying the evolutionary outcomes and predicting the evolutionary parameters of structural variations arising
from duplication, deletion, inversion, and translocation events. Our preliminary studies indicate that these
techniques will be much more powerful and accurate than previous approaches, and will therefore compose
major advancements in evolutionary investigations of structural variations. In addition to implementing our
methods in open source software packages, we will apply them to assay the evolutionary implications of different
types of structural variations in humans and several other animal and plant taxa. Comparisons will be made
among different types of structural variations, their evolutionary outcomes, and taxonomic groups. The major
goal of these studies will be to ascertain the general rules by which different types of structural variation
contribute to evolutionary innovation. Together, these studies will shed light on how gene duplication, deletion,
inversion, and translocation work in concert to generate a diversity of complex adaptations across the tree of life.