Interdisciplinary Summer Institute on the Analysis of Complex, Large-Scale Longitudinal Data - Project Summary Contemporary large-scale NIH initiatives have led to the emergence of many high-quality publicly available longitudinal datasets that that include complex data of various types, sources, and domains (e.g., biological, social, individual, family, neighborhood, etc.). However, use of these datasets without training can lead to scientific setbacks, including work that is imperfect, misleading, or even incorrect. There is an urgent need for educational programming to train researchers both within and outside of academic careers on the innovative and responsible use of publicly available, large, and complex longitudinal datasets. This R25 application is to develop and offer an “Interdisciplinary Summer Institute on the Analysis of Complex, Large-Scale Longitudinal Data”, refining it each year based on evaluation data (aim 1). We will also leverage this program to train graduate students to teach advanced longitudinal methods to participants from multiple disciplines (aim 2). Thus, we will serve two groups: program participants (aim 1), and Purdue graduate student teaching assistants (TAs, aim 2). During an immersive week-long summer institute each year, we will train 50 interdisciplinary participants including students, postdocs and faculty across academic institutions (Y1-Y3), expanding to also include professionals in non-profits, governmental agencies, and industries (Y2, Y3). The course is organized in 10 topics: publicly available longitudinal data sources, introduction to longitudinal data analytic methods, data visualization, missing data, longitudinal categorical data analysis, sampling weights and clustering/ stratification, time varying and time-invariant covariate inclusion, combining multiple data sources, embedded family-based designs, and an intro to sociogenomics—emphasizing cross-cutting themes of data management, visualization and communication, causal inference, measurement and modeling decisions, meaningful effect sizes, and representativeness. Lecture examples and assignments will focus on substance use and associated factors and will use the Adolescent Brain and Cognitive Development study data, although participants will be encouraged to use whatever dataset is most relevant to their own research interests. The summer institute will also feature TAs and additional faculty instructors circulating the room in each session to support students in need of extra assistance in real-time, as well as review and office hour sessions, experience in interdisciplinary environments, networking, and joint practice opportunities to help establish collaborations. We will also train 6 graduate student TAs each year, who will gain supervised experience in content development, instruction (via review sessions), consulting, course evaluation, and leadership within interdisciplinary environments. We have carefully designed recruitment strategies to train a diverse (e.g., under-represented groups, discipline, and career stage and path) workforce, and a multi-pronged evaluation plan. Our program faculty includes 8 faculty experts in longitudinal data analysis and instruction, representing different fields, genders, and career stages.