Abstract: This project aims to establish distributed federated learning (FL) approaches for training robust,
clinically deployable machine learning (ML) models for, i) multi-class classification of DR, and ii) prediction of
proliferative DR (PDR) progression, in optical coherence tomography (OCT) angiography (OCTA). DR is one of
the leading causes of severe vision loss. Early detection, prompt intervention, and reliable assessment of
treatment outcomes are essential to prevent irreversible vision loss from DR. Quantitative OCTA analysis and
OCTA-ML models have recently been applied to diagnose, classify, and understand the progression trends of
DR. Despite promising results, the clinical utility of OCTA based diagnostic algorithms is not yet fully determined,
due to small OCTA data-cohorts in clinical institutions, and the lack of wide-spread validation. More specifically,
a major limitation of OCTA-ML models is the need for large amounts of well curated datasets from a diverse sub-
population for robust performance. Moreover, efforts towards large, centralized datasets for ML research are
hindered by significant barriers to data sharing and privacy concerns. In this project, we aim to establish novel
federated ML approaches, where the model training is distributed across institutions instead of sharing patient
data and only the model parameters are shared with a central server. This enables gaining insights
collaboratively, e.g., in the form of a consensus model, without moving patient data beyond the firewalls of the
institutions. Three data cohorts from the Stanford University, University of Illinois Chicago (UIC), and National
Taiwan University (NTU) will be used to test the hypothesis that the accuracy of the OCTA-ML models using
federated approach is more robust than models built on single institutional datasets. Our first aim is to establish
an FL framework with adaptive domain alignment and enhanced data representation learning capability. Key
success criterion of aim 1 is to successfully integrate the pilot institutions into the FL framework for distributed
training of DR models for multi-class DR classification backed by comprehensive OCTA (textural, geometric, and
differential artery-vein (AV)) features. The second aim is to validate the FL-trained OCTA-ML and differential AV
complexity features for PDR progression on new longitudinal data from UIC and NTU. Key success criterion of
aim 2 is to validate OCTA-ML model performance and identify AV features that provide sensitive biomarkers to
predict PDR in patients with DR. As an alternative approach, we propose a vision transformer deep learning
model for PDR prediction. The attention mechanism of a transformer model can identify features of DR that can
provide new information and specific onsets of PDR progressions. Further investigation of the relationship
between the new features learned through the transformer model and clinical biomarkers will allow us to optimize
the design for better DR diagnosis/prognosis. Success of this project will establish distributed ML model training
approaches and pave the way towards using quantitative OCTA features for early DR detection, objective
prediction and assessment of treatment outcomes.