Iron-CLAD: securely advancing AoU participant characterization with proven platforms and collaborations - ABSTRACT Precision medicine aims to accurately classify patients to improve diagnosis, intervention selection, and prognosis. The All of Us Research Program (AoURP) collects a diverse array of data types from participants, including surveys, electronic health records (EHRs), physical measurements, wearable devices, and biosamples, offering valuable insights into health trajectories. However, certain aspects of a patient’s life remain unrepresented in the collected data, which can limit the accuracy of research and care. To address this gap, we propose the creation of the All of Us Center for Linkage and Acquisition of Data (CLAD) to supplement existing data sources using passive data streams and deploy integration strategies to put the patient back together again. This team brings together collective experience leading large initiatives involving data acquisition, linkage, harmonization, quality assurance, pipelines and platforms, governance, and security. We will design and implement a data collection, linkage, and integration strategy that lays a foundation for a variety of AoURP data linkages for identified, and de-identified data integration, including person-level linkages such as with mortality, residential history, and administrative claims, and geocoded data pipelines to enable linkages with the Environmental Justice Index. The CLAD will acquire and process new data linkages and geocoded data in a cloud-based Data Linkage Platform (DLP), guided by our experience formulating researcher-ready datasets with scientific utility. Our CLAD team will perform data quality assurance, repair, and standardization checks to ensure accuracy and robustness of data-driven research. This endeavor will align data with interoperability standards and clinical terminologies, extend them where necessary, and create a data quality dashboard for every data change and data health check Data Quality reports for each of the sources and sites. We will also explore new methods of clinical data acquisition from HINs to mitigate data missingness with a focus on underrepresented populations by comparing AoURP participant-linked ambulatory EHR data from OCHIN, which includes Medicaid and uninsured patients, with EHR data from health systems served by Datavant. Diverse CLAD sources and novel analytical methods, such as probabilistic models, will be used to reveal patterns of care and potential interventions for communities underrepresented in biomedical research.