Racial/ethnic disparities in preterm birth (PTB) are persistent in the U.S., with a higher prevalence of PTB in
non-Hispanic (N-H) Black women than their N-H White counterparts. However, the underlying mechanism of
such Black-White differences is not well understood. Even extensive biomedical, behavioral, and socio-
demographic risk factors can explain only about half of PTB incidence. Chronic stress has received significant
attention as a robust predictor of PTB, particularly among racial/ethnic minority groups. Nevertheless, literature
shows inconsistent evidence on the relationships among race/ethnicity, chronic stress, and PTB, mainly
because of the complexities involved in assessing women’s chronic stress exposures. Accurate chronic stress
measures should capture the nature of stressors: cumulative, interactive, and population-specific. In this
regard, conventional statistical models (e.g., linear regression) have limited ability to model chronic stress
exposures with high precision. Thus, this study will adopt machine learning (ML), a state-of-the-art modeling
technique, to compute non-linear and synergistic relationships among chronic stressors, detect unknown
patterns, and reflect subtle differences in chronic stressors between N-H White and N-H Black women for more
accurate prediction of their PTB risk. I will develop simple, accurate, and explainable ML algorithms of chronic
stress exposures by building a hybrid algorithm specific to N-H White and N-H Black women and computing
SHAP (SHapley Additive exPlanations) values. Specifically, the hybrid algorithm will combine Multivariate
Adaptive Regression Splines (MARS) and Deep Neural Network (DNN) algorithms where MARS will select
only “important” chronic stressor variables for each race/ethnicity to serve as DNN’s input features for PTB risk
prediction. Additionally, a SHAP value for each chronic stressor in the final algorithm will quantify its degree of
contribution to the predicted PTB risk. The ML algorithms will be trained and tested on a large national
database—Pregnancy Risk Assessment Monitoring System (2012-2017)—collected by 37 U.S. states. The
study’s specific aims are to 1) compare the accuracy among logistic regression (LR) and two ML algorithms
(DNN and hybrid) of chronic stress exposures to predict PTB risk using area under the receiver operating
characteristic curve (AUC); 2) compare the accuracy between race/ethnicity-combined and race/ethnicity-
specific models within LR, DNN, and hybrid algorithms; and 3) determine the extent of the importance of
chronic stressors to the predicted PTB risk in the best-performing algorithm using regression coefficients (for
LR) or SHAP values (for ML algorithm). Career development goals are to 1) develop expertise in stress
measurement in the context of maternal and child health, 2) acquire knowledge and skills in ML and the
analysis of large-scale data, and 3) cultivate health informatics-focused manuscript and grant preparation skills
for independence. Results from this study will contribute to preventing PTB among vulnerable pregnant women
via early screening with more accurate, data-informed tools to assess these patients’ chronic stress.