Learn, transfer, generate: Developing novel deep learning models for enhancing robustness and accuracy of small-scale single-cell RNA sequencing studies - Project Summary. Single-cell RNA-sequencing (scRNAseq) technologies measure transcriptome-wide gene expression at the single-cell level. In contrast to bulk RNA-sequencing, scRNAseq can elucidate dynamic expression patterns between different cellular populations. A key problem in scRNAseq studies is the inability to transfer knowledge between independent sequencing studies directly. As a result, it has been necessary for researchers to spend a significant amount of time and resources generating massive datasets to enable meaningful analyses, a process that is costly and often not reproducible. Another transformative technology is spatial transcriptomics (ST), which provides genetic profiles of cells while containing the positional information on the sequenced cell. ST has the potential to expand our understanding of cellular heterogeneity, interactions, and pathology; however, ST is still an emerging technology and is not widely available for many studies. This proposal will fulfill the unmet need for scalable algorithms that transfer knowledge from existing datasets to new studies, leveraging learned representations to construct the sequenced tissue's spatial information. I propose to achieve these goals through the following aims: (1) Transfer knowledge from existing public single- cell data to new experimental data using a deep neural-attention network, and (2) develop the first spatially- informed model for generating realistic scRNAseq data. In Aim 1, I will use the attention mechanisms (which have revolutionized many fields in computer science) to learn complex gene dependencies intelligently and learn important biological features (e.g., marker genes) in a fully self-supervised manner, providing biological interpretability that is desperately needed. Such a model can be used in many tasks and for datasets with relatively few samples. The learned knowledge obtained from Aim 1 will be used directly in Aim 2. In Aim 2, I will build upon our state-of-the-art generative model to generate synthetic data that contains spatial information (coordinates) of sequenced cells, even when no atlas is available. This model will allow researchers to produce synthetic data with spatial information and augment sparse and noisy datasets for more robust and accurate analyses, all possible without the need for additional costly experiments. This proposal will support my dissertation research, which will be the foundational body of work for my career as a researcher in computational genomics. During the tenure of this award, I will receive specialized training in the underlying mathematics and biology needed for developing frameworks for scRNAseq analysis. I will contribute to the existing literature by developing novel methodology and creating open-source software, making our tools and models easily accessible to the broader scientific community. Achieving the proposed aims will significantly enhance scRNAseq pipelines and analysis, making them more robust and accurate. This will additionally facilitate the study of smaller datasets, potentially reducing the number of patients and animals necessary in initial studies.