DESCRIPTION (provided by applicant): Orphan genes in pathogenic bacteria PI: Yanbin Yin, Northern Illinois University Almost 3,000 completely finished prokaryotic genomes are available in the GenBank database, ~22,000 more are in draft assembly status and even more are in sequencing pipeline. New computational tools are increasingly demanded to deal with these ever-increasing genomes to gain new biology. One particularly interesting observation when analyzing microbial genomes is that every sequenced genome contains a significant amount of orphan genes (or ORFans) without homologs in other genomes. However no computational tools are currently available for automated genome-wide identification, classification, annotation and presentation of ORFans in bacterial genomes. Recent pan-genome studies of pathogenic strains and their non-pathogenic relatives of many bacterial species suggested that ORFans play a major and significant role in pathogenesis. We believe that there are to-be-discovered associations between ORFans and the well-known pathogenesis agents such as pathogenic islands (PAIs), phages, plasmids and other mobile genetic elements, and that such associations could be revealed by a comparative study of ORFans in pathogenic and non- pathogenic genomes of the same species. Previously we developed a mathematical function to quantitatively score the uniqueness of genes in a genome and applied it to the identification of ORFans in 277 prokaryotic genomes and 1,456 viral genomes. We found that every studied genome contained a significant number of ORFans, although the percentages of ORFans in different species vary considerably. Overall ~14% of prokaryotic genes and ~30% viral genes are ORFans. In this grant proposal we will consider the fact that new genes (ORFans) have been arising continuously during evolution. We will develop a new computer program implementing a new algorithm to not only automatically identify but also classify ORFans into groups of different ages. We will also apply this new program to 8,431 genomes of 195 human pathogenic bacterial species.