Scalable software engineering and methods for large-scale subcellular spatial omics - Project Summary / Abstract Emerging spatial subcellular transcriptomics technologies represent a major advancement in biological research, offering to study the relationship of gene expression and tissue architecture at unprecedented resolution and scale. These technologies promise to transform fields across the biomedical research by providing detailed insights into cellular behavior and tissue architecture in health and disease. However, despite their transformative potential, the ability to fully leverage these new technologies is currently limited by several critical barriers. First, no scalable computational tools exist that facilitate an interactive and efficient analysis of the massive datasets generated by these platforms, making it difficult for researchers to explore and extract meaningful insights. Second, the current analytical approaches are largely focused on single-cell resolution and fail to utilize the rich transcript-level information that is encoded in the data, limiting our ability to understand complex tissue organization and the molecular underpinnings of biological processes and morphology. Finally, there is a pressing need for systematic methods that integrate and analyze spatial data across multiple samples, which would enable robust statistical analyses and link spatial gene expression patterns to clinical or experimental outcomes. This project will directly address these challenges by developing a comprehensive suite of computational tools designed to unlock the full potential of spatial subcellular data. 1) We will create a scalable and extensible database infrastructure that supports interactive analysis of large datasets on standard computing systems, removing the need for specialized high-performance computing resources. Additionally, 2) we will develop novel methods that fully harness the high-resolution transcript spatial distribution information to improve the identification of cells, boundaries, and spatial domains within tissues. These methods will also extend to 3D datasets, allowing for more accurate modeling of complex tissue structures. Finally, 3) we will build a robust statistical framework that integrates spatial data from multiple samples, enabling researchers to associate detailed spatial gene expression patterns with clinical features or experimental outcomes. These tools will be available as standalone tools and integrated into the widely used Giotto Suite, thereby democratizing access to advanced spatial data analysis and facilitating novel discoveries in both basic and clinical research.