Wednesday, June 18, 2025 6/18/2025

Correcting biases in deep learning models

Award Number: R01GM144486
ORGANIZATION: NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES
OPDIV: NIH
AWARD CLASS: DISCRETIONARY
AWARD ACTIVITY TYPE: SCIENTIFIC/HEALTH RESEARCH (INCLUDES SURVEYS)
PERIOD OF PERFORMANCE START DATE: 01/20/2023
PERIOD OF PERFORMANCE END DATE: 12/31/2027

Group Awards By:

View Award Description

Correcting biases in deep learning models - Project Summary/Abstract Deep learning (DL) has been widely applied across all life sciences to construct predictive models. However, it relies on the assumption that training samples are independent and identically distributed. This is frequently violated in the life sciences, where data is “grouped” by measurements from the same sample (patient, cell, tissue), by the same observer, or at the same site. This leads to clusters of correlated data (random effects), and when the models are fit to such data, the model parameters can be severely biased, leading to type I and II errors. Proper accounting for such dependencies in DL models has gone unsolved. The objective of this proposal is to develop the appropriate DL modifications to separately model global fixed effects and random effects that increase model interpretability and performance for precise unbiased predictions related to human disease. Our proposal is based on a novel, model-agnostic framework to transform conventional DL models into proper mixed effects DL (MEDL) models. This affords capabilities of statistical linear mixed effects models, including the separation of cluster-invariant fixed effects from cluster-specific random effects, while preserving the ability of DL to learn data-driven nonlinear associations. The core premise is that proper MEDL models 1) are more resilient to confounding effects and more attentive to true predictive features, 2) can capture, quantify, and visualize random effects to enhance interpretability, and 3) attain better generalization to new clusters. We propose to incorporate MEDL into three of the most important DL model types including dense feed-forward neural networks (DFNNs), convolutional neural networks (CNNs), and autoencoders. Our preliminary results demonstrate multiple advantages of MEDL over conventional DL in both accuracy and interpretability. MEDL outperforms previous clustered data approaches including: domain adversarial models, meta-learning, and the inclusion of cluster membership as an input covariate. We developed an ME-DFNN to predict conversion from mild cognitive impairment to Alzheimer’s Disease (AD) from tabular data, an ME-CNN to diagnose AD from MRI, and an ME-autoencoder to compress and classify live cell images. Across these test cases, MEDL models were the most discriminative between known confounded and real features; they were able to quantify or visualize the random effects and outperformed other models on clusters both seen and unseen during training. This proposal further develops the methods to handle complex architectures and hierarchical effects, with external validation, through these aims: 1) Develop ME-DFNNs for classification and regression. 2) Develop 3D ME-CNNs and multi- modal 3D ME-CNNs for medical image classification. 3) Develop convolutional and vector ME-autoencoders for image and omics data. We describe the innovative incorporation of an adversarial classifier to constrain the base model to learn fixed effects, a Bayesian random effects subnetwork, and an approach to apply random effects to unseen clusters. All these solutions will be released as open source software that improve existing DL models to ultimately support precision biomedicine for the study and treatment of human disease.


Issue Date FY	Funding FY	Legal Entity Name	Legal Entity Address	Legal Entity City	Legal Entity State	Legal Entity Zip Code	Legal Entity COUNTY	Legal Entity COUNTRY	Assistance Listing	Award Code	Budget Year	Action Date	Action Type	Action Amount

Issue Date FY: 2025 ( Subtotal = $309,960 )
2025	2025	THE UNIVERSITY OF TEXAS SOUTHWESTERN MEDICAL CENTER	5323 HARRY HINES BLVD	DALLAS	TX	75390	DALLAS	USA	Biomedical Research and Research Training	000	3	12/19/2024	NON-COMPETING CONTINUATION	$309,960
														Subtotal = $309,960

Issue Date FY: 2024 ( Subtotal = $344,400 )
2024	2024	THE UNIVERSITY OF TEXAS SOUTHWESTERN MEDICAL CENTER	5323 HARRY HINES BLVD	DALLAS	TX	75390	DALLAS	USA	Biomedical Research and Research Training	001	2	6/18/2024	NON-COMPETING CONTINUATION	$34,440
2024	2024	UNIVERSITY OF TEXAS SOUTHWESTERN MEDICAL CENTER, THE	5323 HARRY HINES BLVD	DALLAS	TX	75390	DALLAS	USA	Biomedical Research and Research Training	000	2	12/20/2023	NON-COMPETING CONTINUATION	$309,960
														Subtotal = $344,400

Issue Date FY: 2023 ( Subtotal = $344,400 )
2023	2023	UNIVERSITY OF TEXAS SOUTHWESTERN MEDICAL CENTER, THE	5323 HARRY HINES BLVD	DALLAS	TX	75390	DALLAS	USA	Biomedical Research and Research Training	000	1	1/19/2023	NEW	$344,400
														Subtotal = $344,400

Grand Total All Awards = $998,760

Top

All Categories

About

Search

Reports

Data Submission

Award Information

Correcting biases in deep learning models

Award Number: R01GM144486

ORGANIZATION: NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES

OPDIV: NIH

AWARD CLASS: DISCRETIONARY

AWARD ACTIVITY TYPE: SCIENTIFIC/HEALTH RESEARCH (INCLUDES SURVEYS)

PERIOD OF PERFORMANCE START DATE: 01/20/2023

PERIOD OF PERFORMANCE END DATE: 12/31/2027

Federal Websites

Department of Health & Human Services

HHS Operating Divisions

HHS Staff Divisions

Download A Document Viewer