Suicide death rates in the United States have increased 35% since 1999. In 2018, there were over 48,000
suicide deaths, and an estimated 1.4 million adults attempted suicide. In response, health systems are
adopting suicide risk prediction models to guide delivery of suicide prevention interventions.
Suicide prediction models estimated from health care records may perpetuate current disparities in health care
access, quality, and outcomes. Suicide prediction models may not accurately identify high-risk patients from all
racial and ethnic groups. Suicide rates vary by race and ethnicity, and both the highest and lowest rates are
seen in traditionally underserved populations. Suicide rates are highest among American Indians and Alaskan
Natives (22.1 per 100,000 people) and lowest in Asian and Pacific Islander, Black, and Hispanic populations
(7.0-7.4 per 100,000 people) compared to 18.0 per 100,000 people for White non-Hispanics.
Differences in performance of suicide risk prediction models across racial and ethnic subgroups have three
possible sources. First, predictors of suicide risk may be measured with error, and this error may be different
for racial and ethnic subgroups. Second, suicide attempts and deaths may be misclassified, and
misclassification rates may differ by race and ethnicity. Third, the association between predictors and
outcomes may vary by race and ethnicity, i.e., risk modification.
Existing methods for estimating prediction models are not designed to address racial and ethnic disparities in
performance. Estimation procedures focus on optimizing performance across the entire population, not within
subgroups, and performance in less prevalent subgroups has little impact on overall accuracy. While machine
learning methods, like random forest, explore interactions between predictors and race or ethnicity, suicide
attempt and death are rare events, which limits the information available to identify race- and ethnicity-specific
risk factors. There is also insufficient guidance on sample size calculations for prediction studies.
We will develop novel statistical methods for random forest models that reduce racial and ethnic disparities in
performance of suicide prediction models by addressing gaps in current methods. Aim 1 will develop new
procedures for prediction model estimation that maximize predictive performance within racial and ethnic
subgroups, rather than maximizing average performance across the entire population. Aim 2 will integrate
methods to adjust for differential outcome misclassification in prediction model estimation and evaluation. Aim
3 will design sample size calculations to determine if a study is able to accurately predict outcomes within
racial and ethnic subgroups. We will use existing data on suicide risk factors and outcomes for 15 million
outpatient mental health, 10 million primary care, and 2 million emergency department visits from the NIMH-
funded Mental Health Research Network to implement our methods and estimate suicide prediction models for
each setting that accurately identify patients at highest risk of suicide across all races and ethnicities.