Sunday, November 16, 2025 11/16/2025

Tackling Big Data problems in biomedical sciences with extended similarity methods

Award Number: R35GM150620
ORGANIZATION: NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES
OPDIV: NIH
AWARD CLASS: DISCRETIONARY
AWARD ACTIVITY TYPE: SCIENTIFIC/HEALTH RESEARCH (INCLUDES SURVEYS)
PERIOD OF PERFORMANCE START DATE: 09/21/2023
PERIOD OF PERFORMANCE END DATE: 07/31/2028

Group Awards By:

View Award Description

Tackling Big Data problems in biomedical sciences with extended similarity methods - PROJECT SUMMARY/ABSTRACT The overall goal of our research program is to develop new multi-purpose similarity-based tools to extract and analyze information from very large datasets in the biomedical sciences. A central aspect of our work will be the determination of the distance (or similarity) between different objects, a fundamental notion that pervades many aspects of modern data science. Similarity searches are at the core of high-throughput virtual screening, an essential task in medicinal chemistry and drug design. Comparisons also play a key role in rationalizing the results of Molecular Dynamics (MD) simulations by helping us to identify the most important conformations of a system, and how they contribute to its dynamic behavior. Similarity-based techniques are also essential in spectral studies, being the foundation behind the post-processing machinery in Imaging Mass Spectrometry (IMS). However, these applications are currently based on metrics that can only compare two objects at a time, so comparing N objects scales quadratically, which makes them fundamentally ill-equipped to handle the amount of data generated by state-of-the-art simulations and experiments. We recently generalized the pair-wise comparisons, proposing extended similarity indices that allow us to compare an arbitrary number of objects simultaneously. Our indices offer unprecedented efficiency, while also outperforming their binary counterparts in diversity picking, feature selection, and clustering. We will leverage these advantages in three main research directions. (1) We will develop protocols to improve the drug design process via careful exploration of the chemical space. The extended indices will allow us to study the relations among various very large molecular libraries, which will be key in polypharmacology and drug repurposing. They will also lead to better measures of chemical diversity and a deeper understanding of structure-activity relations. This will serve as a guide in generative molecular models, resulting in more robust identification of new drug leads. (2) We will present new workflows to efficiently analyze biological ensembles. Our medoid algorithm will identify conformations close to the folded state of a protein, while our clustering will classify the structures corresponding to other metastable states. Alternatively, we will implement sampling techniques that will allow us to analyze very long MD simulations. These tools can then be combined to gain a deeper understanding of various dynamical processes, including the detailed exploration of protein folding landscapes. (3) We will develop new post-processing techniques to aid with the interpretation of IMS data. Our similarity indices can be used to identify spatially- and molecularly-correlated domains in tissues, without the unphysical artifacts present in other techniques. This will allow us to track the spatial heterogeneity of metabolic processes, which is critical to the validation of IMS data and to establishing new diagnosis tools. The application of our framework to the study of lipid expression in pancreatic tissue will lead to a better understanding of type 1 diabetes metabolism and pathophysiology.


Issue Date FY	Funding FY	Legal Entity Name	Legal Entity Address	Legal Entity City	Legal Entity State	Legal Entity Zip Code	Legal Entity COUNTY	Legal Entity COUNTRY	Assistance Listing	Award Code	Budget Year	Action Date	Action Type	Action Amount

Issue Date FY: 2025 ( Subtotal = $345,420 )
2025	2025	UNIVERSITY OF FLORIDA	1523 UNION RD RM 207	GAINESVILLE	FL	32611	ALACHUA	USA	Biomedical Research and Research Training	000	3	9/5/2025	NON-COMPETING CONTINUATION	$345,420
														Subtotal = $345,420

Issue Date FY: 2024 ( Subtotal = $341,126 )
2024	2024	UNIVERSITY OF FLORIDA	1523 UNION RD RM 207	GAINESVILLE	FL	32611	ALACHUA	USA	Biomedical Research and Research Training	000	2	8/30/2024	NON-COMPETING CONTINUATION	$341,126
														Subtotal = $341,126

Issue Date FY: 2023 ( Subtotal = $336,787 )
2023	2023	UNIVERSITY OF FLORIDA	1523 UNION RD RM 207	GAINESVILLE	FL	32611	ALACHUA	USA	Biomedical Research and Research Training	000	1	9/19/2023	NEW	$336,787
														Subtotal = $336,787

Grand Total All Awards = $1,023,333

Top

All Categories

About

Search

Reports

Data Submission

Award Information

Tackling Big Data problems in biomedical sciences with extended similarity methods

Award Number: R35GM150620

ORGANIZATION: NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES

OPDIV: NIH

AWARD CLASS: DISCRETIONARY

AWARD ACTIVITY TYPE: SCIENTIFIC/HEALTH RESEARCH (INCLUDES SURVEYS)

PERIOD OF PERFORMANCE START DATE: 09/21/2023

PERIOD OF PERFORMANCE END DATE: 07/31/2028

Federal Websites

Department of Health & Human Services

HHS Operating Divisions

HHS Staff Divisions

Download A Document Viewer