Monday, December 8, 2025 12/8/2025

A knowledge-guided analysis approach to recovering rare signals from single-cell transcriptomic data

Award Number: R21GM159319
ORGANIZATION: NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES
OPDIV: NIH
AWARD CLASS: DISCRETIONARY
AWARD ACTIVITY TYPE: SCIENTIFIC/HEALTH RESEARCH (INCLUDES SURVEYS)
PERIOD OF PERFORMANCE START DATE: 07/01/2025
PERIOD OF PERFORMANCE END DATE: 06/30/2027

Group Awards By:

View Award Description

A knowledge-guided analysis approach to recovering rare signals from single-cell transcriptomic data - PROJECT SUMMARY This innovative project aims at developing computational methods that overcome a major limitation of existing single-cell transcriptomic data analysis methods. Single-cell transcriptomics has enabled profiling of gene expression in individual cells. This is particularly useful for studying rare cells because their data are largely invisible if mixed together with other cells in a bulk sample. There are many examples of biologically important rare cells, such as tissue stem cells, senescent cells, endothelial progenitors, and tumor-initiating cells. Ironically, existing analysis methods for single-cell transcriptomic data often ignore rare cells. This is because in a standard step included in all mainstream analysis pipelines called dimensionality reduction, rare signals can easily be discarded in order to preserve the most prominent signals. As a result, rare cells are usually not clustered together in the reduced data, which in turn makes it difficult to identify and study these cells. To tackle this problem, here we propose the novel concept of knowledge-guided single-cell data analysis. Taking marker genes of cells of interest as externally-supplied knowledge, our algorithm will be instructed to retain information about these genes during dimensionality reduction. As a result, the rare cells are much more likely to be clustered in the reduced data. Another important application of our methods is separating highly similar cell sub-populations. By supplying genes differentially expressed (DEGs) between them as knowledge input, they will become more separated in the reduced data. In Aim 1, we will design and implement the computational methods. We will use the autoencoder artificial neural network framework, which is proven to be useful for single-cell data, and introduce novel components to take and use the external knowledge. A key aspect of our methods will be that both the external knowledge and the data itself will be respected, which means the dimensionality reduction process will pay attention to the marker genes/DEGs only if the most prominent signals in the data can also be preserved at the same time. In Aim 2, we will systematically test the effectiveness of our methods in identifying rare cells and separating highly similar cell sub-populations using published data sets. We will use independent data to define cell populations, such as cell surface protein measurements in CITE-seq, and use them to quantitatively assess how well our methods cluster rare cells and separate different cell sub- populations. We will benchmark against state-of-the-art single-cell data analysis methods. In Aim 3, we will assess the effects of having noisy and incomplete knowledge inputs. The former refers to genes not specifically expressed in a rare cell type or not differentially expressed between cell sub-populations, while the latter refers to specifically/differentially expressed genes that are not supplied as knowledge input. We will artificially include noisy genes and exclude informative genes to study tradeoffs between comprehensive yet noisy and precise yet incomplete knowledge inputs. Overall, this project will produce computational methods and open-source software that will propel the study of important rare signals in single-cell transcriptomic data.


Issue Date FY	Funding FY	Legal Entity Name	Legal Entity Address	Legal Entity City	Legal Entity State	Legal Entity Zip Code	Legal Entity COUNTY	Legal Entity COUNTRY	Assistance Listing	Award Code	Budget Year	Action Date	Action Type	Action Amount

Issue Date FY: 2025 ( Subtotal = $536,250 )
2025	2025	SANFORD BURNHAM PREBYS MEDICAL DISCOVERY INSTITUTE	10901 N TORREY PINES RD	LA JOLLA	CA	92037	SAN DIEGO	USA	Biomedical Research and Research Training	000	1	6/18/2025	NEW	$536,250
														Subtotal = $536,250

Grand Total All Awards = $536,250

Top

All Categories

About

Search

Reports

Data Submission

Award Information

A knowledge-guided analysis approach to recovering rare signals from single-cell transcriptomic data

Award Number: R21GM159319

ORGANIZATION: NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES

OPDIV: NIH

AWARD CLASS: DISCRETIONARY

AWARD ACTIVITY TYPE: SCIENTIFIC/HEALTH RESEARCH (INCLUDES SURVEYS)

PERIOD OF PERFORMANCE START DATE: 07/01/2025

PERIOD OF PERFORMANCE END DATE: 06/30/2027

Federal Websites

Department of Health & Human Services

HHS Operating Divisions

HHS Staff Divisions

Download A Document Viewer