PROJECT SUMMARY/ABSTRACT
The economic toll of substance use/abuse is estimated to be over $740 billion annually as a result of
accidents, health care, homelessness, unemployment, and criminal activity. Adolescence is a critical time for
intervention, as 90% of adults who meet the criteria for addiction initiate use of alcohol or drugs in
adolescence. Yet, prevention efforts have been hindered by minimal substance use screening by primary care
providers as well as low rates of disclosure by adolescents in medical settings. These challenges necessitate
new approaches to detect key risk factors and enhance screening methods. Importantly, adolescents with
experiences of child maltreatment are more susceptible to early substance use and more likely to progress
from experimentation to addiction than non-maltreated youth. The accumulating evidence and our preliminary
data suggest that the top predictors of early substance use are not relevant for child welfare (CW) youth,
requiring new studies of the relevant risk factors for this vulnerable population. The proposed study addresses
these gaps by using cutting-edge Machine Learning models to provide vital new evidence regarding risk
factors specific to the CW population as well as risks for early substance use that may be common to both CW
and non-CW youth. We will use two unique data sources to accomplish this. Our primary data will come from
electronic health records (EHR) of Kaiser Permanente Southern California (KPSC) members (estimated
sample size of 3.4 million children, 2007-2020). We will use diagnosis codes for maltreatment to indicate the
CW sample, a reasonable assumption of referral to child welfare. Risk factors will be obtained from diagnosis
codes and abstractable progress notes in the EHR of children and parents as well as county crime and
geographic income data. Second, to address the limitations of EHR to capture more detailed psychosocial
data, we will use an existing longitudinal dataset of 454 youth, 303 referred from child welfare and 151 in a
comparison group (YAP study). Participants were seen at mean ages 11, 13, 15, and 18 years old and are
racially/ethnically diverse. Collected data includes measures of child level, parent level, family level, and
neighborhood risk factors and CW case records. These two data sources will allow us to: 1) produce critical
new knowledge regarding the relevant predictors of early substance use for CW versus non-CW youth and 2)
use intensive survey data (YAP) to determine risk factors that are not currently collected in EHR data (KPSC)
that may inform the development of new screening questions. Lastly, our predictive model has translational
potential to advance screening methods for adolescent substance use risk in pediatric primary care through the
use of risk scores integrated into clinical decision support tools. These findings, if implemented in clinical care
settings, would allow medical providers to more accurately identify those at risk and trigger stratification into
different treatment pathways to prevent substance abuse.