Summary
Type 2 Diabetes (T2D) is a rapidly growing chronic disease that disproportionately affects ethnic minorities,
particularly Puerto Ricans, in the US. The Boston Puerto Rican Health Study (BPRHS) and Puerto Rico
Observational Study of Psychosocial, Environmental, and Chronic Disease Trends (PROSPECT) are
complementary cohorts designed to explore T2D-related disparities in different social and environmental
contexts. The parent proposal aims to examine the relationship between DNA methylation, T2D prevalence and
incidence, and various social stressors, protective factors, and behaviors in these cohorts.
The supplement aims to expand the parent proposal by focusing on three main objectives. First, it will improve
the AI/ML readiness of the BPRHS and PROSPECT datasets using FAIR guidelines and identifying and
documenting potential biases and imbalances in the dataset. This process will ensure that the datasets are
suitable for advanced analysis and knowledge discovery using AI/ML techniques.
Second, the supplement will concentrate on identifying and structuring SDOH information for AI/ML applications.
By using the NIMHD PhenX toolkit of Common Data Elements (CDEs) associated with SDOH, the research
team will be able to perform a more comprehensive investigation into the relationships between social
determinants and T2D risk and outcomes. This will also improve the AI/ML readiness of the SDOH-related data,
allowing for a more sophisticated analysis of the complex interactions between social factors and T2D.
Lastly, the supplement will involve developing a proof-of-concept machine learning model for T2D classification
and risk factor identification. By applying AI/ML techniques to the transformed BPRHS and PROSPECT datasets,
the research team will be able to uncover novel insights into T2D risk and management that were previously
unattainable using traditional analytical methods. This model will serve as a practical example of how AI/ML can
be used to advance biomedical research and improve health outcomes in the context of T2D. To ensure
transparency and reproducibility in AI/ML research, the supplement will also involve documenting and sharing
the pipeline for data processing and model training/evaluation. This will enable other researchers to build upon
the work conducted in this study, further promoting the application of AI/ML techniques to NIH-funded biomedical
data. By integrating AI/ML techniques into the original research plan, the supplement will contribute to a deeper
understanding of the complex interplay between epigenetics, genetics, environment, and social factors in T2D.
This additional layer of analysis will not only support the development of innovative public health strategies to
prevent T2D and related health disparities in vulnerable populations but also showcase the benefits of
interdisciplinary collaborations in harnessing AI/ML for scientific discovery and improved health outcomes.