Project Summary
A comprehensive 3D molecular map of the human body would provide valuable information that is
critical for studying human-related processes and biological systems such as development, aging, and
disease. Towards this goal of constructing such a map, multidisciplinary consortia such as the Human Cell
Atlas (HCA) and the Human BioMolecular Atlas Program (HuBMAP) have developed technologies for profiling
the transcriptome and proteome in single cells. Out of these technologies, methods for single-cell spatial
proteomics have only very recently been developed; for example, recent advances in multiplexed imaging have
enabled the profiling of tens to hundreds of proteins per cell. While the generation of single-cell spatial
proteomics data promise to revolutionize our ability to study cell-cell interactions, it also raises several
computational and modeling challenges. Cell segmentation remains a long-standing problem that usually
requires tailored solutions for each bioimaging experiment. Even after cells are segmented, using expression
values to infer cell type and organization is challenging. There are currently no standardized methods
developed that jointly incorporate spatial and molecular information to analyze the complex biological
interactions from rich spatial proteomics datasets.
This project proposes to develop computational methods to provide a comprehensive solution for the
use of spatial proteomics data for building 3D molecular maps of the human body. We hypothesize that jointly
profiling spatial and molecular relationships from spatial proteomics datasets captures biological patterns that
would otherwise be missed. In Aim 1, a method will be developed for RAnking Markers for CEll Segmentation
(RAMCES) in order to choose the optimal protein markers to use for cell segmentation. In Aim 2, a unified
learning framework that incorporates both protein expression and cell neighborhood information will be
constructed in order to assign cells to phenotypes and reveal spatial patterns. In Aim 3, methods will be
developed to infer cell-cell and protein-protein interactions in spatial proteomics data. The methods developed
in this project will be integrated into the HuBMAP processing pipeline to analyze spatial proteomics datasets.
We will also apply and validate these methods using data from pancreatic lymph nodes that profile individuals
with and without Type 1 diabetes to analyze changes associated with the disease at an unprecedented scale.
Together, completing the proposed aims will enable the HuBMAP project to uncover new biological interactions
in cells and tissues and expand our understanding of molecular interactions at a single-cell level.
This proposal outlines a training plan that comprises of mentored research training, coursework, and
professional development. The knowledge and skillset developed during the training period will be necessary
for the applicant's long-term goal of becoming a successful independent scientist working at the interface of
machine learning, computer science, and biology.