Wednesday, September 17, 2025 9/17/2025

Efficient algorithms for searching large mass spectral databases

Award Number: R35GM158059
ORGANIZATION: NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES
OPDIV: NIH
AWARD CLASS: DISCRETIONARY
AWARD ACTIVITY TYPE: SCIENTIFIC/HEALTH RESEARCH (INCLUDES SURVEYS)
PERIOD OF PERFORMANCE START DATE: 09/01/2025
PERIOD OF PERFORMANCE END DATE: 08/31/2030

Group Awards By:

View Award Description

Efficient algorithms for searching large mass spectral databases - A crucial problem in various areas of life science is to determine which known small molecules are present/absent in a specific sample. As an example, physicians might be interested in characterizing the molecules in oral/urinal/fecal samples collected from a patient. Ecologists are interested in characterizing the molecules produced by microbes in various environmental / host- oriented microbial communities. Natural product scientist focusing on discovery of novel antimicrobial or antitumor molecules are interested in determining all the known molecules in their sample, in order to focus their effort on the novel ones. In next five years, the goal of Mohimanilab is to develop efficient algorithms for identification of known and novel small molecules in complex samples. Efficient search of a spectrum against a large number of mass spectra/predicted spectra is a fundamental task arising in various problems including mass spectral library search and identification of small molecules by database search of mass spectra in metabolomics. Given a database of millions/billions of reference spectra and a query spectrum, our goal is to find the (predicted) spectrum that generates the query spectrum. Modern search engines usually form a probabilistic model between the spectra and the reference molecules. Then a probabilistic score can be calculated between each query spectrum and each reference molecule. The main problem with this approach is that the runtime of the search grows linearly with the size of the database. For example, searching a single query spectrum against all the reference spectra available at the Global Natural Product Social molecular networking infrastructure dataset (~1 billion spectra) takes more than two weeks on a single CPU. In the case of unrestricted search allowing for a modification of the query spectrum in relation to the reference, the runtime increases to multiple years per query spectrum. Therefore, faster approaches are needed to search mass spectra against large reference databases. In this proposal, we develop indexing strategies to speed up these queries. In this proposal, we will develop new algorithmic approaches for identifying known/novel small molecules from complex samples, based on mass spectrometry data. Tools developed during this proposal will be provided to the scientific community through the Global Natural Product Social molecular networking webserver.


Issue Date FY	Funding FY	Legal Entity Name	Legal Entity Address	Legal Entity City	Legal Entity State	Legal Entity Zip Code	Legal Entity COUNTY	Legal Entity COUNTRY	Assistance Listing	Award Code	Budget Year	Action Date	Action Type	Action Amount

Issue Date FY: 2025 ( Subtotal = $433,125 )
2025	2025	UNIVERSITY OF CALIFORNIA, LOS ANGELES	10889 WILSHIRE BLVD STE 700	LOS ANGELES	CA	90024	LOS ANGELES	USA	Biomedical Research and Research Training	000	1	8/21/2025	NEW	$433,125
														Subtotal = $433,125

Grand Total All Awards = $433,125

Top

All Categories

About

Search

Reports

Data Submission

Award Information

Efficient algorithms for searching large mass spectral databases

Award Number: R35GM158059

ORGANIZATION: NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES

OPDIV: NIH

AWARD CLASS: DISCRETIONARY

AWARD ACTIVITY TYPE: SCIENTIFIC/HEALTH RESEARCH (INCLUDES SURVEYS)

PERIOD OF PERFORMANCE START DATE: 09/01/2025

PERIOD OF PERFORMANCE END DATE: 08/31/2030

Federal Websites

Department of Health & Human Services

HHS Operating Divisions

HHS Staff Divisions

Download A Document Viewer