Tuesday, April 30, 2024 4/30/2024

Fast, powerful, scalable, usable, and distributable methods for multi-modal single cell analyses

Award Number: R01HG013317
ORGANIZATION: NATIONAL HUMAN GENOME RESEARCH INSTITUTE
OPDIV: NIH
AWARD CLASS: DISCRETIONARY
AWARD ACTIVITY TYPE: SCIENTIFIC/HEALTH RESEARCH (INCLUDES SURVEYS)

Group Awards By Issue Date FY or Funding FY:

View Award Abstract

SUMMARY While single-cell methods for analyzing gene expression are becoming a standard tool for unpacking cellular heterogeneity and understanding complex tissues in health and disease, other molecular features, especially open chromatin landscapes via ATAC-seq, but also surface protein abundance and the presence of CRISPR guides, are rapidly expanding in their application. Indeed, commercial platforms for generating diverse single- cell data sets have led to an immense increase in scale of these data, and methods for split-and-pool based assays and decreasing sequencing cost all presage an exponentially increasing corpus of future large-scale datasets. We developed ArchR, an analysis infrastructure specifically designed for analysis of large-scale single- cell (sc) ATAC-seq data sets that enables a diverse suite of complex analysis (including QC, doublet removal, iterative TF-IDF clustering, approximation methods for large-scale data sets, trajectory analysis, RNA-seq integration, track visualization, marker peak identification, etc.), all with minimal computing hardware requirements. We estimate that ArchR has thousands of active users and is rapidly becoming the “go to” analysis software for large scATAC-seq data sets. To further extend the utility of ArchR for analyzing multi-omic data sets, we will first engineer substantial improvements to computational efficiency of underlying single-cell computational infrastructure. To do this, we will (1) encode our fundamental matrix operations in C++ to enable streaming data matrix access, thus reducing memory requirements and effectively “lifting the cap” on the number of cells capable of being analyzed through rapid on-the-fly calculations of diverse operations and (2) implement and benchmark efficient on-disk storage using bitpacking algorithms. These data structures and atomic operation libraries will be shared with the genomics community (and are being integrated into the popular Seurat package), allowing repurposing of these performance improvements. Second, we will develop, implement, and benchmark powerful analytical tools for the analysis of large, diverse, and/or multi-omic datasets. We will enable the handling of diverse independent and simultaneously acquired (multi-omic) data types including RNA-seq, ATAC-seq, ADT (CITE-seq), and CRISPR-based perturbation methods. We will develop accurate methods for cross-manifold data linkage for distinct data sets, forced-projection and regression analysis, multi-modality cell clustering, joint analysis of single-cell molecular data sets with CRISPR-based perturbations, single-cell inference of enhancer function via correlation and the “ABC” model, and identification of continuous differentiation trajectories and chromatin “potential.” Finally, we will develop plug-and-play cell type specific deep learning models for prediction of the regulatory effects of noncoding sequence changes. These models will learn single-cell chromatin accessibility profiles from DNA sequence to predict the cell type-specific effects of noncoding sequence changes. We will create a user-friendly system for training, deployment, and sharing sequence-based models of cell type- specific chromatin accessibility, bringing cutting-edge machine learning for functional genomics to wide use.


Issue Date FY	Funding FY	Legal Entity Name	Legal Entity Address	Legal Entity City	Legal Entity State	Legal Entity Zip Code	Legal Entity COUNTY	Legal Entity COUNTRY	Assistance Listing	Award Code	Budget Year	Action Date	Action Type	Action Amount

Issue Date FY: 2024 ( Subtotal = $644,697 )
2024	2024	THE LELAND STANFORD JUNIOR UNIVERSITY	450 JANE STANFORD WAY	STANFORD	CA	94305	SANTA CLARA	USA	Human Genome Research	000	1	1/30/2024	NEW	$644,697
														Subtotal = $644,697

Grand Total All Awards = $644,697

Top

All Categories

About

Search

Reports

Data Submission

Award Information

Fast, powerful, scalable, usable, and distributable methods for multi-modal single cell analyses

Award Number: R01HG013317

ORGANIZATION: NATIONAL HUMAN GENOME RESEARCH INSTITUTE

OPDIV: NIH

AWARD CLASS: DISCRETIONARY

AWARD ACTIVITY TYPE: SCIENTIFIC/HEALTH RESEARCH (INCLUDES SURVEYS)

Federal Websites

Department of Health & Human Services

HHS Operating Divisions

HHS Staff Divisions

Download A Document Viewer