Efficient algorithms for searching large mass spectral databases - A crucial problem in various areas of life science is to determine which known small molecules are present/absent in a specific sample. As an example, physicians might be interested in characterizing the molecules in oral/urinal/fecal samples collected from a patient. Ecologists are interested in characterizing the molecules produced by microbes in various environmental / host- oriented microbial communities. Natural product scientist focusing on discovery of novel antimicrobial or antitumor molecules are interested in determining all the known molecules in their sample, in order to focus their effort on the novel ones. In next five years, the goal of Mohimanilab is to develop efficient algorithms for identification of known and novel small molecules in complex samples. Efficient search of a spectrum against a large number of mass spectra/predicted spectra is a fundamental task arising in various problems including mass spectral library search and identification of small molecules by database search of mass spectra in metabolomics. Given a database of millions/billions of reference spectra and a query spectrum, our goal is to find the (predicted) spectrum that generates the query spectrum. Modern search engines usually form a probabilistic model between the spectra and the reference molecules. Then a probabilistic score can be calculated between each query spectrum and each reference molecule. The main problem with this approach is that the runtime of the search grows linearly with the size of the database. For example, searching a single query spectrum against all the reference spectra available at the Global Natural Product Social molecular networking infrastructure dataset (~1 billion spectra) takes more than two weeks on a single CPU. In the case of unrestricted search allowing for a modification of the query spectrum in relation to the reference, the runtime increases to multiple years per query spectrum. Therefore, faster approaches are needed to search mass spectra against large reference databases. In this proposal, we develop indexing strategies to speed up these queries. In this proposal, we will develop new algorithmic approaches for identifying known/novel small molecules from complex samples, based on mass spectrometry data. Tools developed during this proposal will be provided to the scientific community through the Global Natural Product Social molecular networking webserver.