PROJECT SUMMARY
Alternative splicing (AS) is a fundamental cellular process that regulates 95% of multi-exon genes to diversify
protein output and define cell-type specific functions. Both constitutive splicing and AS are controlled by
combinations of cis-acting pre-messenger RNA sequences (pre-mRNA) and trans-acting RNA-binding proteins
(RBPs). Therefore, defects in splicing regulatory RNA sequence or RBPs can be highly disruptive to basic cellular
activities and often lead to disease, especially neurological and muscular disorders and cancer. While the
constitutive splicing code is well established, the AS code is more complicated and, thus, poorly understood.
This proposal integrates multiple cutting-edge approaches to take an RBP-centric view of AS to study both cis
genetic variants and trans RBP expression effects on RBP binding and splicing outcome. While thousands of
non-coding genetic variants are associated with splicing variation, and are thus termed putative splicing
quantitative trait loci (sQTLs), the causal variants and their molecular effects, such as RBP binding, are largely
unclear. Aim 1 will address this gap by integrating RBP-focused experiments, allele-specific genomics, and state-
of-the-art machine learning predictive models to characterize an important category of functional, cis non-coding
variants that alter RBP binding. Importantly, I will take a unique approach to include these allele-specific RBP
binding data as additional training data for our Convolutional Neural Net model. Model output is expected to
much more accurately predict functional RBP binding effects of even a single nucleotide change in sequence,
enabling improved interpretation of sQTLs. In addition to genetic variant effects, changes in RBP expression can
have amplified downstream effects on RNA splicing. Interestingly, ~86% of RBP genes can be expressed as
more than one splice isoform, but most studies to date have ignored RBP isoform-specific abundance and
function. Aim 2 will provide foundational experiments to understand differential RBP isoform effects by using a
novel approach to knockdown RBP isoforms by targeting Cas13 to unique exon junctions. Data from downstream
assays that assess changes in RBP binding, splicing, and RNA localization will be integrated to construct the
most comprehensive RBP regulatory networks to date. Results from both aims are essential to mechanistically
link RNA sequence and RBP binding to splicing outcome and, ultimately, to phenotype and disease.
My long-term goal is to become a principal investigator, where I will continue to leverage molecular biology,
machine learning, and statistical genetics to answer unique questions about RNA-mediated associations
between non-coding sequence and cellular and disease phenotype. The research and training plans proposed
here are strategically tailored to provide ample opportunities to learn and apply machine learning and statistical
genetics methods that complement my former PhD training in molecular biology and genomics. My sponsor, co-
sponsor, and collaborators at the NYGC are committed to providing the scientific expertise, computational
training, and career development mentoring to ensure the successful achievement of my goals.