Abstract
Annotations of coding genes in the human genome have been tremendously useful in understanding etiology of
genetic disorders and in basic biology research. Despite being the most accurate and comprehensive set of
genomic features annotated, emerging evidence has indicated that an increasing number of translated regions
are missing from the current annotation. These overlooked genomic regions, or formally translated open reading
frames (tORFs), represents important biology missing from the current literature. For example, myoregulin, a
conserved 46 amino acid micro-peptide was discovered in a “non-coding” region, and was later demonstrated to
function in regulating skeletal muscles in mice. These potentially functional novel tORFs are often small, and
therefore overlooked by most coding gene annotation programs. To overcome this challenge, efforts leveraging
functional genomics datasets to identify novel coding regions across the human genome have begun to reveal
this previously underappreciated class of genomic features. In particular, the applicants previously developed a
computational method, riboHMM, which leverages patterns specific to the translated regions in functional
genomics data, such as ribo-seq data, in order to identify tORFs genome-wide. Using riboHMM to systematically
annotate tORFs in human lymphoblastoid cell lines, 7,273 novel tORFs were found, in addition to the tORFs of
known coding genes. These novel tORFs were found in regions of the transcriptome previously annotated as
non-coding (e.g. Untranslated Regions and lincRNAs). Although newly developed methods, such as riboHMM,
can now systematically identify thousands of previously overlooked tORFs, the biological relevance of these
translation events remains unclear. The objective of the current proposal is to evaluate functional relevance for
these newly discovered tORFs. Three major aspects of biological importance will be evaluated. First, loss of
function impact. Effects of tORF deletion on cell viability and synthetic fitness impact in combination with well-
characterized coding genes will be evaluated using pooled CRISPR dropout screens (Aim 1). Second, ability to
encode protein/peptide. The ability of tORFs to produce stable protein/peptide will be evaluated in mass
spectrometry studies designed for detecting translation products of small ORFs (Aim 2). Third, evolution
conservation. The strength of purifying selection on these loci will be carefully evaluated using new alignments
created based on independently annotated novel tORFs in chimpanzee and rhesus macaque. The completion
of the proposed aims will provide the first systematic evaluation of biological relevance for novel tORFs. Impacts
of these new functional annotations could range from providing new interpretations for GWAS hits to reevaluating
“non-coding RNA” function. Results from the proposed study will guide future research directions on this group
of previously overlooked genomic features. Given the sheer number of unexplored tORFs and the prior examples
of overlooked tORFs that turned out to play critical roles in important biological pathways, the findings here will
have far reaching implications for both basic and translational biomedical research.