SUMMARY/ABSTRACT
We propose to establish an innovative summer training program to introduce fundamental principles of
biostatistics and data science to a diverse pool of promising undergraduate and beginning graduate students for
careers as quantitative scientists with a focus on biomedical research. The proposed seven-week program,
Summer Institute for Training in Biostatistics and Data Science at Columbia (SIBDS@Columbia), will be
taught by a pool of world-renowned faculty with expertise and extensive funded research in several key areas of
biostatistics and data science, hailing from diverse backgrounds and including women, African, Latino, and first
in their family to attain a college education, with passion for hands-on mentoring of students and diversity in the
workforce. Trainees will have the opportunity to interact with world-class experts in domain areas relevant to the
missions of NHLBI and NIAID and will be immersed in research through carefully designed projects utilizing
quantitative skills acquired from the program on data from studies involving heart, lung, blood, and sleep
disorders and infectious disease epidemiology. The program will be housed within the Department of Biostatistics
at Columbia University, which has a long and successful history of training promising undergraduate students
from diverse backgrounds through its long-running BEST (Biostatistics and Epidemiology Summer Training)
Diversity Program and past cycles of the Columbia SIBS (CSIBS) program. Under the proposed training program,
we will create a pipeline complimentary to (and synergistic with) the existing BEST program by focusing on state-
of-the-art data science skills, as opposed to the BEST program’s focus on the intersection of biostatistics and
epidemiology. There will also be new innovations to demonstrate the value of biostatistics and data science in
interdisciplinary research and also a new added focus on research of infectious diseases. Specifically, we
propose to (i) identify and recruit a diverse and quantitatively skilled group of 14 undergraduate and beginning
graduate college students every summer; (ii) immerse trainee cohorts in a carefully designed curriculum of
fundamental concepts of biostatistics and data science, computing skills, and hands-on biomedical data analysis;
(iii) mentor trainees on professional development toward graduate studies in biostatistics and data science and
subsequent careers; and (iv) ensure success of trainees by a well-structured tracking system through degree
completion, pursuit of graduate studies, and subsequent careers as quantitative scientists. Given Columbia’s
reputation as a major research hub, its extensive portfolio in NIH-funded research, its expert faculty and mentors,
an existing network of successful alumni from previous BEST/CSIBS cycles as models for trainees, and its
location in ethnically diverse and culturally rich New York City, we are poised to continue to contribute
substantially toward expanding a diverse pool of well-trained biostatisticians and data scientists that can handle
the complex data analyses needed to address today’s most pressing research questions.