Specialized Tools and Auto-updatable Scalable Interactive Databases to Study isomiRs, tRFs and rRFs in Human and Mouse - This project focuses on three categories of small RNAs: the isoforms of microRNAs (miRNAs) that are known as isomiRs; the fragments that are derived from transfer RNAs (tRNAs) and are known as tRFs; and, the frag- ments that are derived from ribosomal RNAs (rRNAs) and are known as rRFs. IsomiR, tRFs, and rRFs have several important properties that warrant their detailed study: (1) They account for ~80% of all small RNAs in a cell. (2) They regulate the abundance of messenger RNAs (mRNAs) and pro- teins. (3) Their expression patterns depend on cellular “context” (e.g., tissue type, disease type). (4) In hu- mans, their expression patterns additionally depend on “personal attributes” (e.g., sex, genetic ancestry, age). To correctly mine isomiRs, tRFs, and rRFs from RNA-seq data the ideal tools must address several compli- cating factors. First, the same short sequence (e.g., tRF) can arise from different parental RNAs. These parental RNAs can belong to the same sub-type (e.g., different tRNA isodecoders of the same tRNA isoacceptor) or different sub-types (e.g., isodecoders from different tRNA isoacceptors). Second, the sequences of many iso- miRs, tRFs, and rRFs can also be found in unrelated regions of the genome. Third, paralogues and/or incomplete copies of miRNAs, tRNAs, and rRNAs riddle the nuclear genomes of many organisms including human and mouse. The details of these complicating factors are specific to the RNA type and to the genome. Consequently, the ideal tools must be target-genome-specific. The complicating factors and the need for genome specificity appeared in the literature only recently. As a result, most available tools to date have been general-purpose and do not account for these complications. Not surprisingly, most available databases were built using general-purpose tools. Without realizing the underlying shortcomings, many researchers relied on the information provided by these tools and databases to design experiments and analyze their data. In turn, this has led to many published articles that unintentionally describe findings of unclear value about molecules that are not always isomiRs, tRFs, or rRFs. We will address these gaps as follows. In Aim 1, we will build specialized tools that address the peculiarities of each RNA type and accurately mine isomiRs, tRFs, and rRFs from human and mouse RNA-seq data. The tools will be robust, self-contained, and user-friendly. In Aim 2, we will build specialized databases to organize and provide easy access to information about isomiRs, tRFs, and rRFs that we have already compiled by mining 50,000 public datasets. In Aim 3, we will build a system that auto-identifies newly-added datasets to NIH’s SRA, profiles and annotates each dataset’s isomiRs, tRFs, and rRFs, and updates the databases with the new infor- mation each month. In Aim 4, we will create educational material describing best practices to help researchers benefit maximally from this framework, and build a system to allow them to interact with one another and submit their feedback. Lastly, we will validate experimentally select small RNAs implicated in breast cancer metastasis.