Unsupervised Statistical Methods for Data-driven Analyses in Spatially Resolved Transcriptomics Data - Project Summary/Abstract Recently developed spatially resolved transcriptomics (ST) technologies measure transcriptome-wide gene expression at a near-single-cell, single-cell, or sub-cellular resolution in intact tissue, preserving the spatial organization of complex tissues. These technologies build upon widely-adopted single-cell RNA sequencing (scRNA-seq) technologies by adding spatial coordinates to the transcriptome-wide gene expression measurements, thus enabling an understanding of how the spatial organization of cells in complex tissues influences function, disease initiation, progression, and therapeutic response in human health and disease. However, these technologies also present new statistical and computational challenges, which need to be addressed to accurately interpret this complex data. While initial studies applying these tools have reused data analysis methods and data storage techniques designed for scRNA-seq, unfortunately these approaches largely ignore spatial information. Furthermore, existing methodologies for ST data rely on external information such as marker genes or reference cell types, potentially leading to systematic errors and biased results during preprocessing, feature selection, classification of spatially resolved cell types, and differential discovery. There do not yet exist robust and accurate preprocessing and unsupervised statistical methodologies to investigate ST data in a data-driven manner. The overall goals of this K99/R00 Pathway to Independence Award proposal are to request support to address this fundamental gap in statistical methodology to develop spatially-aware (1) methods for preprocessing, (2) unsupervised methods for spatially resolved clustering and differential discovery between conditions, and (3) data infrastructure and benchmarking resources to standardize the storage and access of ST data. These proposed methods will lead to an improved understanding of health and disease mechanisms. This proposal will provide the training, mentoring, and professional development to accomplish my research goals and transition to a tenure track faculty position at a research institution with independent extramural funding. As the demand for ST technologies grows, in particular now that it has been highlighted as the Nature Methods 2020 Method of the Year, these urgently needed statistical methods and open-source software proposed in this project will enable ST technologies to transform precision medicine through novel biological insights relating to spatial properties of cell populations and gene expression in healthy and diseased tissues. At the completion of this award, I will become part of a new generation of researchers, proficient in spatial statistics, machine learning, and spatial transcriptomics technologies, enabling me to work closely with biomedical researchers spatially profiling the transcriptomes of complex tissues.