Federated learning methods for heterogeneous and distributed Medicaid data - Project Summary
The broad objective of this project is to develop federated learning approaches that can efficiently reduce
uncertainty and improve generalizability when assessing treatment effects based on multiple data sources. The
proposal is motivated by a study of the Medicaid Outcome Distributed Research Network (MODRN) of eleven
states in assessing the quality and access of medications for opioid use disorder (OUD). The collection of
Medicaid claims data accounts for 40% of the OUD population in the US and covers a wide array of treatment
choices, making it an ideal data source for understanding subgroup-specific treatment effects and developing
precision health strategies. We leverage this large-scale distributed research network (DRN) to investigate the
heterogeneous treatment effect (HTE) of buprenorphine, an opioid-based medication, on overdose mortality.
However, the extra source of heterogeneity across states due to variation in state policy environments, which
is largely unobserved, has presented great challenge in the assessment of HTE. Existing approaches such as
meta-analysis are inadequate and underpowered to address the translational research needs in understanding
the complex interactions among treatments, clinical characteristics and social determinant of health, especially,
under the heavy influence of unexplainable heterogeneity across states. A suite of novel approaches will be
developed to address a wide range of analytical requests that support data-driven precision health research
under the framework of federated learning, where states collaboratively build analytical models under the
orchestration of a coordinating state without pooling individual-participant data. With a central goal of modeling
for different levels of heterogeneity in DRNs, this project focuses on the following aims: 1. To develop and
evaluate a high-precision HTE estimator for buprenorphine for Pennsylvania by incorporating modeling
information from ten other states; 2. To develop and evaluate a generalizable treatment recommendation
system that protects vulnerable populations and is robust to policy variation across states. The methods will be
rigorously tested and delivered as user friendly statistical software. The proposed methods extend well beyond
MODRN and easily find applications in other common DRNs, such as hospital data networks and mobile data
networks.