Abstract: The evolutionary origins of most pathogenic microbes are rooted in free-living, benign ancestral
organisms that, through acquisition and innovation, gain new gene functions and, with them, the ability to
colonize new habitats (e.g., a human host). The specific roles of these new genes are often known, whether it
be antigenic variability or cell invasion, but the properties of the genetic sequences that enable such functions
are poorly understood. Many of the questions that remain unanswered can be investigated with an evolutionary
approach that can provide perspective on (i) where do the pathogenicity-related genes come from, (ii) how and
if can we identify signatures of pathogenicity within sequences, and (iii) the possibility of predicting the
emergence of pathogenicity based on gene evolution. These questions can be answered with computational
approaches to investigate genome complexity within the genus Plasmodium, which is the agent of malaria.
Plasmodia are known for their high frequency of low complexity regions (LCRs) that are segments of genome
with lower-than-expected nucleotide and amino acid diversity. LCRs are also known for their high rate of
changes, which makes them excellent candidates for sources of genetic innovations and new functions. The
first aim of the proposed project consists in the development of new computational measures to identify
sequences involved in genome complexity. The second aim is a comparative analysis of protein coding genes
with and without LCRs to determine the primary forces driving their evolution. The reconstruction of ancestral
states in these genes will allow to identify evolutionary mechanisms and selective pressures at the origin of
LCRs and their potential connection to the evolution of pathogenic lifestyles. The third aim will be a functional
analysis of genes with and without LCRs to determine computationally if these regions are essential for the
proper formation of a functioning product. Compositional biases, length variation, and evolutionary histories
across species will be used to determine conservation of these regions through time and correlate this
information with gene function. The proposed research will provide new insights into the evolution of
pathogenesis and its signatures within genomes of the genus Plasmodium.