Identification of microproteins associated with hematopoiesis and related diseases - PROJECT SUMMARY This innovative, inter-disciplinary project aims at discovering novel microproteins encoded by small open reading frames (smORFs) that play crucial functions in normal hematopoiesis and hematopoietic diseases. Defined as proteins with less than 100 amino acids, microproteins have traditionally been largely overlooked because their short lengths made them difficult to detect by standard biochemical methods and their homology scores less distinguishable from non-functional random sequences. Recent technological advances, such as ribosome profiling and corresponding bioinformatics analysis methods, have created new opportunities for identifying microproteins. Since then, an increasing number of microproteins have been discovered to play key functions in various biological processes, such as the pTUNAR microprotein that regulates calcium homeostasis, with pTUNAR deficiency affecting embryonic stem cell differentiation potentials, and the Dleu2-17aa microprotein that controls regulatory T-cell function. Yet a comprehensive catalog of microproteins expressed in hematopoietic cell lineages is still lacking. Some of the difficulties include background noise in ribosome profiling data, insufficient data for smORFs with a low expression level or translation rate, and experimental conditions that affect translation efficiency. Therefore, it is advantageous to get independent supports of smORF/microprotein expression from multiple types of data. The large amount of transcriptomic (RNA-seq) data produced from sorted human hematopoietic cell types represents a great resource that has been under- utilized for this purpose. In this project, we will perform secondary analysis of existing transcription and translation data and integrate them with evolutionary information to produce the first comprehensive catalog of microproteins in hematopoietic cell types. We will produce annotation files that detail all types of evidence for the transcription and translation of smORFs in individual cell types and make all these files freely available to the public to facilitate further studies. In addition, to explore the functional roles of these microproteins, we will perform a variety of data analyses, including i) differential expression analysis for finding cell type-specific microproteins that may be involved in defining cell type identity, ii) co-expression analysis with canonical protein-coding genes to transfer their functional annotations to the microproteins, iii) differential expression analysis between normal and disease states to identify microproteins up- or down- regulated in diseases, and iv) GWAS lookup to identify disease-associated loci that are close to smORFs we identify. Finally, we will integrate all the information to prioritize the smORF candidates for our future studies. This project will build upon the expertise and years of collaborations between the two MPIs, bioinformaticist Dr. Kevin Yip, who has many years of experience in integrating and analyzing omic data when working with the ENCODE, modENCODE, and IHEC consortia, and other projects, and biologist Dr. Ani Deshpande, who has many years of experience in research on hematopoietic transformation and relevant diseases.