Functional regionalization of the brain revealed by multi-modal neural and genomics dataintegration - Project Summary/Abstract In the past decade, genome-wide characterization of gene expression in cells dissociated from biological tissues has transformed the understanding of cell types that build organs in a variety of organisms. However, to define the precise arrangement of cell types within a tissue or organ, analysis of large-scale spatial transcriptomics and integration with other spatial datasets such as neural connectivity patterns is needed. To achieve these integrative analyses of multiple highly dimensional datasets (e.g., hundreds of genes in cellular resolution, measured across the whole organ), all datasets need to be brought together in the same common coordinate system and new computational algorithms that can handle the complexity of data need to be developed. Studies from the Allen Institute for Brain Science and the Broad Institute are now providing the first whole-brain spatially resolved transcriptomics datasets and providing an integrative view of the cells that make up the brain and their spatial location. One opportunity that these new datasets provide is to define a completely data-driven anatomic parcellation/atlas of the mouse brain. Such a parcellation will be an enormous resource for the systems and molecular neuroscience communities both in formulating new hypotheses for the mechanisms of brain function and investigating existing results. However, current methods are unable to accommodate the scale, complexity, and inherently multimodal nature of integrating these spatial cellular and molecular taxonomies with the wealth of other data (such as connectomics, proteomics, and functional). In this proposal, we aim to utilize new developments in the field of machine learning to address this need for the development of an unsupervised computational algorithm that can synthesize disparate and large datasets of the mouse brain into the next generation of reference anatomical parcellation/atlas. Specifically, we propose a novel deep learning framework called DeepGene to predict spatial cell-type clusters in whole-brain spatial transcriptomics datasets (Aim 1). Our proposed method takes advantage of the unique flexibility of the transformer neural network architectures to scalably model groups of observations with minimal structure. We will then train a more comprehensive version of DeepGene using a combination of eight whole-brain MERFISH datasets. We will use this model to develop a novel parcellation/atlas of the adult mouse brain shared across eight mouse brains. We hypothesize (and provide evidence through preliminary results) that our model delineates cellularly distinct subregions in the mouse brain with differentially abundant cell types (Aim 2). Finally, we will establish a two-stage sequential version of DeepGene to integrate spatial transcriptomics datasets with axonal projection datasets and discover finer brain subregions (Aim 3). The proposed computational frameworks are generalizable to spatial transcriptomics datasets in any tissue, organ, or organism and from any species.