Project INTEGRATE Data Science Academy: Training Researchers in Applied AI/ML Techniques - PROJECT SUMMARY The evolution of data science, encompassing statistical machine learning (SML) and artificial intelligence (AI), has been underway since the 1990s. Nevertheless, the field of SML/AI has witnessed a significant acceleration in recent decades, exemplified by the launch of ChatGPT (Chat Generative Pre-trained Transformer) by OpenAI in November 2022. This development underscores the growing potential to harness data science for training the next generation of clinical investigators, empowering them to utilize available computing algorithms and tools in the rapidly expanding data science field to address novel questions in changing research environments. The addiction research field has challenges, including small data sets for low base rate behaviors (e.g., epidemiologic studies), high dimensional and sparse data (p >> n; e.g., addiction neuroscience, genetics, mHealth), non-Gaussian outcome data distributions (e.g., intervention and treatment trials), and not fully engaging in Open Science practices. The proposed R25 research education grant, “Project INTEGRATE Data Science Academy: Training Researchers in Applied AI/ML Techniques,” will train early-career scientists in computational AI/ML techniques. PI Mun and her long-term collaborative team have a track record of engaging in and promoting Open Science and have been recognized as one of the finalist teams for NIH DATAWorks! Prize Challenge in 2022. The proposed research education program will be 12 months long and interdisciplinary. It will be delivered primarily online, targeting four cohorts of predoctoral and postdoctoral trainees. We anticipate up to 8-10 trainees per cohort recruited from the pool of trainees receiving support from institutional training grants such as T32 and R25, as well as from national research societies and their early-career networks (e.g., the Society for Prevention Research, the Research Society on Alcohol). The training program will encompass (1) up to 18 modules of SML/AI training, (2) biweekly research seminars and conference participation, and (3) hands-on research experience culminating in papers guided by a team of program faculty (mentors). The applied SML/AI training will cover AI-assisted programming in R and Python, advanced statistics and SML, deep learning and cloud computing, the FAIR principles, Open Science practices, and research ethics, including responsible conduct in research. The program's outcomes will include a series of publicly accessible and free online learning modules. This R25 Research Education program will make all data, codes, and packages publicly accessible. Program effectiveness will be monitored for improvement, with success gauged by publications and their impact on the field, as measured collectively by the entire cohorts of trainees and program faculty annually. This program will result in clinical investigators who are well-versed in applied ML/AI, contributing to cumulative and reproducible addiction science.