PROJECT SUMMARY / ABSTRACT
Allele-speci¿c expression quantitative trait locus (eQTL) mapping has become increasingly popular, since it en-
hances the traditional eQTL mapping by providing signi¿cantly more detailed gene regulatory mechanisms un-
derlying the genetic architecture of diseases. Allele-speci¿c eQTL mapping identi¿es cis-acting and trans-acting
eQTLs that each pinpoint to cis-regulatory elements and trans-acting factors, by leveraging the fact that unlike
trans-acting eQTLs, cis-acting eQTLs affect the expression of transcripts from the same haplotype as the variant
itself, causing allelic imbalance in expression. However, allele-speci¿c eQTL mapping requires a reliable long-
range phasing of genome sequences and an accurate allele-speci¿c expression quanti¿cation from RNA-seq data
consistent with the genome phasing. Most existing works have treated allele-speci¿c expression quanti¿cation
and phasing as independent tasks, even though each can enhance the accuracy of the other. In this proposed
research, we will modify and pair up the two widely-used tools, SHAPEIT for genome phasing and Salmon for
RNA-seq quanti¿cation, to obtain an accurate phasing and allele-speci¿c expression quanti¿cation consistent
with each other for allele-speci¿c eQTL mapping. The combined tool will inherit or enhance the accuracy and
ef¿ciency of the two original methods. If phased sequences are known from experimental or trio data, we will
replace the EM algorithm of Salmon with an accelerated EM to address the extreme multi-mapped read problem
with computational ef¿ciency. If phased sequences are not available as in unrelated individuals, we will modify
SHAPEIT to jointly phase the variants and allele-speci¿c read abundances, embedding allele-speci¿c expression
quanti¿cation within SHAPEIT and using Salmon for obtaining transcript quanti¿cation and allele-speci¿c read
abundances. As a testbed, we will use genotype and RNA-seq data from a 50 generation intercross, cross be-
tween two inbred mouse strains. Because these data are derived from two fully sequenced inbred founders, the
correct phase is known. Though we use mice as a testbed, our approach is applicable to data from any diseases,
tissues, and organisms, including GTEx data.