Abstract: Age-related macular degeneration (AMD) is one of the leading causes of severe vision loss. Early
detection, prompt intervention, and reliable assessment of treatment outcomes are essential to prevent
irreversible vision loss from advanced AMD stages such as geographic atrophy (GA). Patients at higher risk of
progression to GA would benefit from more frequent follow up visits, low vision referrals, and administration of
therapeutic interventions. Deep learning (DL) techniques have recently been applied to diagnose, classify, and
understand the progression trends of GA. However, a major limitation of DL is the need for large amounts of well
curated datasets from a diverse sub-population for robust diagnostic or prognostic performance. Due to the
overfitting on training data, the model tends to perform badly on external data (less generalizability of the model).
Moreover, efforts towards large public centralized datasets for DL research are hindered by significant barriers
to data sharing, privacy concerns, costs of image de-identification, and controls over how data would be used.
In this project, we aim to demonstrate the utility of novel federated DL approaches, which enable gaining insights
collaboratively, e.g., in the form of a consensus model, without moving patient data beyond the firewalls of the
institutions in which they reside. This novel paradigm of DL model training focuses on distributing the training of
DL models across institutions instead of sharing patient data and only the model parameters are shared with a
central server. We specifically seek to build robust risk models for predicting the occurrence and growth of GA.
Four data cohorts from the Stanford University, University of Illinois Chicago (UIC), Wake Forest University
(Wake Forest), and National Taiwan University (NTU) will be used to test the hypothesis that the prognostic
accuracy of the GA risk models using federated approach is more robust than models built on single institutional
datasets. Our first aim is to establish a federated learning (FL) framework for GA prediction utilizing longitudinal
multi-modal imaging and patient meta-data from four independent institutions (training and independent testing
dataset from Stanford, UIC, Wake Forest, and NTU). Key success criterion of the aim 1 study is to demonstrate
a robust and secure FL framework for GA risk model training within the multi-institutional environment. The
second aim is to integrate a novel adversarial domain alignment (ADA) technique into the FL framework to tackle
domain shift caused by heterogeneous data distribution at different institutions. To improve data representation
learning, and model transferability and generalizability across sub-population data, a novel self-supervised
contrastive learning (CL) based methods will be employed within the FL framework. Key success criterion of the
aim 2 study is to establish protocols for integrating domain alignment into FL framework and evaluate the FL-
trained GA prediction models deployed on new and previously unseen clinical data. Clinical deployment of such
AI prediction tools will facilitate identification of high-risk AMD patients as candidates for more frequent screening
and earlier treatment, leading to better clinical outcomes.