PROJECT SUMMARY
This R03 proposal describes a two-year research plan focused on the investigation of previously unannotated
miniproteins to the genetic etiology of Autism Spectrum Disorder (ASD). Rare inherited and de novo genetic
variants are important causes of ASD, but explain the etiology in only ~20% of families. Furthermore, rare
ASD-associated variants have been identified in noncoding regions of the genome, but the functional
significance of those found is largely unknown, representing a critical knowledge gap in ASD genetics. We
recently completed a study to profile the mRNA translational landscape (the “translatome”) of the human brain,
identifying thousands of small open reading frames (sORFs) encoding putative miniproteins <100 amino acids,
many of which are translated from annotated noncoding regions of the genome. We hypothesize that
miniproteins represent an unappreciated cache of hidden genes whose role in disease is almost entirely
unexplored. To address the potential role of miniproteins in ASD, we will leverage the largest whole-genome
sequencing (WGS) dataset in ASD, as well as our newly created atlas of human brain miniproteins, to discover
miniprotein genes associated with ASD risk based on rare inherited and de novo sequence-level and structural
variants (Aim 1). Additionally, our preliminary data suggest that many miniproteins lack three-dimensional
structure (intrinsically disordered) and are rich in sequence motifs that bind RNA. Traditional sequence-based
analysis of proteins will perform poorly on short, highly disordered miniproteins. Therefore, in Aim 2, we will
build physical feature-based analysis paradigms to predict the molecular function of ASD-associated
miniproteins. This work combines expertise in developmental neuroscience, genomics, and computational
protein biology. The knowledge gained from this R03, including the identification of new ASD-associated
genes, will form the basis of future studies to characterize the function, cell type-specificity, and developmental
regulation of miniproteins in the human brain, as well as their contribution to ASD. Our approach to
incorporating previously unannotated miniproteins in genetic analysis can be applied to other neurologic and
non-neurologic conditions, thereby expanding our understanding of the genetic architecture of human disease.