Recently, Single Nucleotide Variants (SNV) detection from single cell RNA sequencing (scRNA-seq)
experiments have started to emerge. These studies have demonstrated the utility of scRNA-seq SNV
assessments to characterize intra-tumoral heterogeneity, define mutation-associated expression signatures,
identify tumor cells displaying lineage infidelity, and evaluate the tumor differentiation state. However, currently,
most of the SNV data from scRNA-seq cancer datasets (10x Genomics) is not obtained at cell-level and
therefore lacks information on SNV-associated cell phenotypes.
SNV assessments from scRNA-seq data can complement DNA-based SNV-studies and maximize the
potential of scRNA-seq datasets. Importantly, they can provide crucial information on the SNV functionality
through studying the variant allele specific dynamics and its correlation to phenotype. Given this wide
application range, the knowledge on cell-level SNV expression and dynamics can be instrumental for
any cancer scRNA-seq study.
In the last year we have developed tools for assessment of Single Cell-specific Expressed SNVs
(sceSNVs). SCExecute executes a user-provided command on barcode-stratified, extracted on-the-fly
individual cell alignments. We apply scExecute in conjunction with variant callers to detect sceSNVs. For
estimation of allele specific sceSNVs expression we apply SCReadCounts, which generates cell-SNV matrices
with cell-level expressed variant allele frequency (VAFRNA). These cell-SNV matrices can be used as inputs for
our other tools scReQTL, scRsQTL, and scSNPair, to correlate variant expression to gene expression,
splicing, and other SNV's expression, respectively. The expression of sceSNVs of interest can be projected in
two-dimensional projection space across all cells in a sample using scSNVis.
Here, we propose to employ the above-described approaches on cancer scRNA-seq datasets
with the aim to assess sceSNVs and to initiate a public Pan-Cancer scVariome catalogue (Aim 1). We
will integrate new and existing (SCReadCounts, scReQTL, scRsQTL, scSNPair and scSNVis) tools for
the discovery and analysis of sceSNVs from scRNA-Seq data in an end-to-end, integrated, containerized,
publicly available pipeline. New tools will incorporate velocity and pseudotime inference analyses to
study sceSNV associations with cell dynamics. Using this pipeline, we will supply functional sceSNV
annotations to the catalogue (Aim 2). Furthermore, we will develop and incorporate a new tool that cross-
references locally observed (by different teams) sceSNVs, indexing by loci, study, sample, and cellular
barcode, providing the additional context of cell-type, study meta-data, and other annotation and
summary results from the catalog, with the overarching goal to facilitate community annotation of
sceSNVs in scRNA-Seq data (Aim 3).