Summer Institute for Training in Biostatistics and Data Science at Columbia (SIBDS@Columbia) - SUMMARY/ABSTRACT We propose to establish an innovative summer training program to introduce fundamental principles of biostatistics and data science to a diverse pool of promising undergraduate and beginning graduate students for careers as quantitative scientists with a focus on biomedical research. The proposed seven-week program, Summer Institute for Training in Biostatistics and Data Science at Columbia (SIBDS@Columbia), will be taught by a pool of world-renowned faculty with expertise and extensive funded research in several key areas of biostatistics and data science, hailing from diverse backgrounds and including women, African, Latino, and first in their family to attain a college education, with passion for hands-on mentoring of students and diversity in the workforce. Trainees will have the opportunity to interact with world-class experts in domain areas relevant to the missions of NHLBI and NIAID and will be immersed in research through carefully designed projects utilizing quantitative skills acquired from the program on data from studies involving heart, lung, blood, and sleep disorders and infectious disease epidemiology. The program will be housed within the Department of Biostatistics at Columbia University, which has a long and successful history of training promising undergraduate students from diverse backgrounds through its long-running BEST (Biostatistics and Epidemiology Summer Training) Diversity Program and past cycles of the Columbia SIBS (CSIBS) program. Under the proposed training program, we will create a pipeline complimentary to (and synergistic with) the existing BEST program by focusing on state- of-the-art data science skills, as opposed to the BEST program’s focus on the intersection of biostatistics and epidemiology. There will also be new innovations to demonstrate the value of biostatistics and data science in interdisciplinary research and also a new added focus on research of infectious diseases. Specifically, we propose to (i) identify and recruit a diverse and quantitatively skilled group of 14 undergraduate and beginning graduate college students every summer; (ii) immerse trainee cohorts in a carefully designed curriculum of fundamental concepts of biostatistics and data science, computing skills, and hands-on biomedical data analysis; (iii) mentor trainees on professional development toward graduate studies in biostatistics and data science and subsequent careers; and (iv) ensure success of trainees by a well-structured tracking system through degree completion, pursuit of graduate studies, and subsequent careers as quantitative scientists. Given Columbia’s reputation as a major research hub, its extensive portfolio in NIH-funded research, its expert faculty and mentors, an existing network of successful alumni from previous BEST/CSIBS cycles as models for trainees, and its location in ethnically diverse and culturally rich New York City, we are poised to continue to contribute substantially toward expanding a diverse pool of well-trained biostatisticians and data scientists that can handle the complex data analyses needed to address today’s most pressing research questions.