ABSTRACT
Substance misuse (SM) puts persons at risk for HIV. Systems of care for the detection and treatment of risks
for HIV acquisition or transmission among SM populations are siloed. The acute-care, hospital setting offers
unique opportunities for screening, testing, and treatment of HIV risks among patients with SM. Adjacent to
communities with the highest number of heroin overdoses in Chicago, Rush University Medical Center
launched the Substance Use Intervention Team (SUIT) in 2017. The SUIT service attempts to screen all
hospitalized patients for SM and intervenes with a harm reduction model based on low, medium, and high risk;
however, the busy setting and acuity and severity of patients’ illnesses limit universal screening rates and
facilitate implicit biases in making determinations about which patients to screen. Automated, clinical decision
support tools trained with supervised machine learning (ML) can relieve these screening burdens. A machine
learning health system approach leverages EHR data, including clinical, social, and behavioral determinants
captured in structured data fields and in clinical notes – unstructured data typically unavailable for predictive
analytics. A ML HIV risk classifier can identify patients with SM and HIV risk and alert providers to evaluate
appropriateness for medication and care to prevent or treat HIV. To date, no screening tool has been
developed and validated to assess for HIV risk among persons with SM. This pilot’s goal is to develop, train,
and test an interoperable ML classifier to identify risk for HIV transmission or acquisition among patients with
SM and assess its real-time performance. Aim 1 is to develop, train, and test a ML classifier with high
sensitivity (≥0.8) and specificity (≥0.8) to identify risk for HIV acquisition or transmission among patients with
substance misuse. Within the source cohort of encounter-level data of patients with SM between 2017-2019
(N=23,817), we will use a rule-based method and Centers for Disease Control HIV risk guidelines to identify as
cases those patient encounters with diagnoses, such as Chlamydia, associated with HIV transmission (6%,
n=1,300). Utilizing propensity score matching we will match non-cases (1:2) and conduct manual chart
annotation in order to verify or re-classify cases and non-cases and to establish the reference dataset
(n=3,900). With labeled cases and non-cases, we will partition the reference dataset to train and test three
supervised learning ML models. We will select the best performing model based on standard metrics, like the
C-statistic. Aim 2 is to integrate the best performing model from Aim 1 into the Rush EHR infrastructure to test
predictive validity in real time, prospectively. As we expect the ML classifier to identify 50% more HIV risk
cases (9-10%) than our rule-based method, we will study the effects of the classifier and measure the number
of risk cases identified over 12 one-month time points in an interrupted time series. This ML classifier is the first
step toward an appropriate, scalable, and interoperable learning health system intervention that integrates HIV
prevention and treatments into care for hospitalized patients with SM.