PROJECT SUMMARY
Short peptides (10-100aa) are important regulators of physiology, development and metabolism, however their
detection is difficult due to size and abundance. A stunning 30% of annotated human smORF genes include
disease-associated variants mapped within exons, compared to 15% of human genes in general. Further,
many smORFs are conserved across the entire metazoan phylogeny from invertebrates to vertebrates
including man. These ultra-conserved functional smORF genes we call the Conserved smORF Catalog or
CSC. These genes have been conserved across more than 500myr of evolution, and yet we know almost
nothing at all about their functions. Due to a century of genetic analysis, the genome of the model organism
Drosophila melanogaster has the most complete functional annotation among metazoans. Functional
annotations derived from Drosophila have been instrumental in hypothesis-based drug development for more
than thirty years, and more recently have made possible the biological interpretation of hundreds of SNPs
detected in genome-wide association studies (GWAS). Hence, functional annotations derived in fly for
conserved genes are transferable to human and are of direct clinical relevance. Remarkably, less than 10% of
smORFs in Drosophila have been studied functionally, or experimentally verified as generating peptides. A
combination of genome engineering, computational, molecular, and functional studies will be used to
systematically and comprehensively characterize the CSC, representing the first genome-scale
characterization of smORFs in any organism providing a wealth of information on the biological functions of
this poorly studied class of proteins. In total, we will characterize and functionally annotate ~400 conserved
smORFs using CRISPR knockout followed by phenotyping and rescue assays. We will assess the phenotypes
of the mutants, measuring viability, morphology, fecundity and fertility, lifespan, metabolism (sugar and lipid
levels), and a number of behavioral phenotypes. For smORFs with robust phenotypes, we will then attempt to
rescue a subset of these mutants in three ways: first, by inserting the whole deleted RNA; second, with a
version of the RNA with the smORF(s) removed by the addition a stop codon; and lastly, using a micro-
construct containing only the smORF and the endogenous promoter. We will generate direct evidence for
translation using tagged expression analysis and targeted MS/MS to scan for predicted polypeptides in the
whole embryo and tissue dissection samples. In addition to validating the existence of the predicted molecules,
this dataset will provide a foundational gold standard for further development of tools for the computational
prediction of functional micropeptides. These studies are directed toward the understanding of basic life
processes and lay the foundation for promoting better human health.