Cell type harmonization of single cell data in HuBMAP and GTEx - Project Summary/Abstract The NIH Common Fund have supported the generation, management, and sharing of single cell genomic data from millions of cells through several large international consortia with the goal of building a comprehensive reference of healthy cells across multiple organs in the human body. We will use single cell/nucleus RNA- sequencing (scRNA-seq) data from the Common Fund-supported Human BioMolecular Atlas Program (HuBMAP) and Genotype-Tissue Expression (GTEx) consortia to prototype a cell type harmonization protocol for constructing a cross-consortia cell census meta-atlas. The HuBMAP consortium provides organ-specific cell atlases for multiple organs, while GTEx provides an integrated cross-organ single cell atlas. Our group has developed and extensively validated computational algorithms, NS-Forest and FR-Match, for biomarker identification and robust cell type matching using scRNA-seq data. Our algorithms utilize Random Forest machine learning and minimum spanning tree graphical modeling, which provide superior classification performance while maintaining high explainability and interpretability for biological applications. In Specific Aim 1, rigorous data quality control approaches will be applied for dataset selection and preparation. The NS-Forest algorithm will then be used to identify optimal biomarker combinations for characterization of organ-specific cell types of individual organs in HuBMAP and cross-organ cell types in GTEx. In Specific Aim 2, we will focus on human lung, as an exemplar organ, to prototype the assembly of a cross-consortia meta-atlas by developing a robust cell type harmonization approach using our validated and benchmarked FR-Match algorithm and HuBMAP-Lung, GTEx lung subset, and other publicly available Human Lung Cell Atlas (HLCA) datasets. We will compare and benchmark FR-Match with two other popular methods, Azimuth and CellTypist, for cell type matching and validate the matching results using all methods. We will also form a domain expert panel to review and validate the cell type harmonization results using domain knowledge and literature information for community approval. We will build a strategy for capturing sample metadata, anatomic structure information, cell type nomenclature and biomarker-based definitions into an ontological representation for the meta-atlas and populate the contents into the Provisional Cell Ontology. In Specific Aim 3, we will disseminate our results to key stakeholder communities, including the HuBMAP Anatomical Structures, Cell Types and Biomarkers (ASCT+B) Working Group and the GTEx Multi-Gene Single Cell Query platform. We will present the project and participate in the Common Fund Data Ecosystem Spring Meeting for engaging the community and soliciting feedback. Beyond the pilot phase, the cell type harmonization framework established in this project can be generally applicable to integrate single cell-based cell type datasets across Common Fund and other data resources.