Novel Computational and Statistical Methods for Single-cell Omics Data - Project Summary My lab develops artificial intelligence (AI), machine learning (ML) algorithms, and statistical methods to analyze various genomic data under different experimental designs. With multidisciplinary training in computer science, statistics, and biology, my research program focuses on developing the informatics of tomorrow in the context of pressing biomedical application problems today, in collaboration with my colleagues in the biomedical field. All methods developed in our lab are implemented into user-friendly, publicly available software packages to maximize their impact. In the past five years, we have focused on single-cell genomics, transcriptomics, and epigenomics. Single-cell RNA sequencing (scRNA-seq) has revolutionized the study of cellular heterogeneity and complex biological systems. Despite advances in computational methods, including our lab's contributions, the full potential of these datasets remains untapped due to the lack of powerful tools for integrating and analyzing vast amounts of single-cell omics data. Additionally, emerging biotechnologies like cellular barcoding, when coupled with single-cell sequencing, necessitate the development of new computational methods to fully realize their potential. Therefore, our goals for the next five years include: (1) developing and optimizing large- scale foundation models for single-cell omics data, (2) creating computational methods for barcoding single-cell omics data, and (3) quantifying cell-type annotation uncertainty in scRNA-seq studies. We will develop innovative AI/ML techniques to address computational challenges in single-cell large-scale foundation models (scLFMs), integrate biological domain knowledge, incorporate other modalities and cross-species data, and develop metrics to evaluate scLFM embedding quality. We will also create methods that leverage barcode information and biological knowledge for clustering, cell-cell communications, and cell trajectory inference, as well as statistical methods for detecting clones with longitudinal changes and identifying genes driving these changes. Lastly, we will use conformal prediction to quantify cell-type annotation uncertainty in scRNA-seq studies. We will develop a basic testing procedure to produce statistically valid prediction sets for each cell and a tree-based testing procedure that considers the hierarchical structures of cell types. The proposed research builds upon the PI’s lab's recent progress in developing deep learning methods for single-cell, epigenomic, and genetic data analysis, as well as statistical methods for transcriptomic data analysis. We emphasize the importance of implementing our proposed methods into user-friendly and open-source software tools to benefit the biomedical community. The overall vision of the research program is to advance the development of computational methods for single- cell omics data analysis, ultimately accelerating biological discovery and clinical applications.