Improving the prediction of opioid outcomes via automated EHR data processing - PROJECT SUMMARY/ABSTRACT This K01 proposal aims to develop and validate an AI-driven framework to efficiently predict and identify severe opioid adverse drug effects (ADEs), including persistent use and related post-surgical outcomes from electronic health record (EHR) data. EHR data are often compromised by inconsistencies, missing values, and outliers, which undermine accurate risk prediction. Traditional data processing methods face human-related limitations such as resource demands, biases, and insufficient domain expertise. This project addresses these issues by leveraging automated machine learning (AutoML), knowledge graphs (KGs), and generative AI to enhance data cleaning and feature engineering processes. Building on my prior experience and research in machine learning and clinical data analysis, the proposed training plan will capitalize on an exceptional mentorship team and translational research environment to foster the candidate's expertise in 1) expanding AutoML to perform EHR- specific data cleaning, 2) developing comprehensive digital KGs enriched with domain knowledge to optimize data processing, and 3) leveraging large language models for human-AI collaborative feature engineering, refinement, and discovery. The proposed research will use EHR records of over 6,000 elective spinal fusion patients and later expand to include external and publicly available datasets. The AutoML tool, STREAMLINE, will be enhanced with custom operators for data cleaning to resolve anomalies, guided by a KG enriched with EHR and addiction medicine expertise (Specific Aim 1). A large language model, integrated with retrieval- augmented generation, will enable human-AI collaboration to streamline past medical history feature engineering and identify novel opioid ADE predictors (Specific Aim 2). The combined impact of these methods will be evaluated on simulated and real-world retrospective and prospective datasets (Specific Aim 3). The central hypothesis is that AutoML-KG data cleaning, together with human-AI feature engineering, will improve predictive performance, data quality, and model interpretability. The project will result in development of scalable, interpretable, and clinically relevant automated computational tools for opioid ADE prediction, enabling broad implementation across diverse healthcare systems. The proposed research will take place at Cedars-Sinai Medical Center under the mentorship/collaboration of Drs. Jason Moore, Ryan Urbanowicz, Corey Walker, Itai Danovitch, Joshua Pevnick, and Tiffani Bright, experts in ML/AI methods, predictive modeling, neurosurgery, addiction medicine, biomedical informatics, and healthcare ethics, respectively. The research and training expertise developed through this K01 award will support the PI's career development as an independent investigator and leading expert in AI-driven prediction of severe opioid ADEs and associated clinical outcomes. Findings will inform an R01 aimed at developing personalized risk assessment strategies incorporating complex data types (e.g., omics, social media, and unstructured clinical notes) to investigate multilevel determinants of opioid misuse, tolerance, opioid use disorder, and overdose.