Statistical and Deep Generative Modeling for Enhanced CyTOF Data Interpretation and Discovery - Project Summary Background: Proteomics and genomics researchers have developed two distinct single-cell profiling technologies: mass cytometry (CyTOF) and single cell RNA sequencing (scRNA-seq). These technologies hold great promise for unraveling cellular heterogeneity, developmental trajectories, cell-cell interactions, and disease physiology. CyTOF directly measures proteins, providing more immediate insights into cellular functions and disease physiology than scRNA-seq. However, analysis methods for CyTOF lag significantly behind those for scRNA-seq. This proposal addresses three critical gaps in developing innovative, user-friendly, and well- validated CyTOF analytical tools. (1) Current CyTOF methods lack a coherent framework to handle multiple tasks seamlessly. Sequential analysis pipelines relying on diverse task-specific models may hinder consistency and interpretability, with neglected task interdependence and uncertainties from earlier steps leading to error accumulation. (2) While scRNA-seq and CyTOF data integration offers significant advantages, an integrated approach that utilizes prior biological knowledge from multiple sources for cross-modality learning is urgently needed to simultaneously perform essential single-cell analysis tasks. (3) The lack of a reliable Imaging Mass Cytometry (IMC) data simulator, which is crucial for thorough testing and objective benchmarking, has impeded the development of robust IMC methods. Overall methodology: We will leverage statistical and deep generative modeling to develop computational methods for integrated and integrative CyTOF analyses and IMC data simulation. A team of experts in Bayesian modeling & computing, scRNA-seq/CyTOF, optimization & structure learning, imaging analysis, and environmental health, has been formed. Our recent publications in Nature Methods, Nature Communications, Cancer Discovery, Annals of Applied Statistics, Biostatistics, and Bioinformatics demonstrate our strong expertise and resources in fields relevant for this study. Aim 1 introduces cytoONE, a scalable, interpretable, and integrated approach that employs both Bayesian hierarchical modeling and deep generative modeling to simultaneously perform common CyTOF analysis tasks. Aim 2 is to develop CySCI, a semi-supervised diagonal integration method, enabling joint analysis of CyTOF and scRNA-seq data. CySCI introduces a novel graph attention network with added regularization to incorporate biological information from different sources, facilitating cross-modality learning for dimension reduction and cell typing while performing cell abundance differential analysis via an integrated Bayesian approach. Aim 3 devises IC-Sim, the first IMC data simulator based on a sophisticated statistical model using Markov Random Fields. Software tools for cytoONE, CySCI and IC-Sim will be developed and applied to data from our collaborators, generating findings for their ongoing studies. Expected impact: The proposed project is expected to significantly enrich the toolset of CyTOF researchers, unlocking the full potential of CyTOF and revolutionizing how researchers perceive and understand CyTOF data.