Training Biomedical Research Teams for Rigor and Reproducibility in Data Science - Abstract: We will develop a training program to shape the thinking, impart skills and tools for rigor and
reproducibility in biomedical data science, and ensure the application of such skills and tools in a wide range of
biomedical research through a learning phase (bootcamp with collaborative learning) and an implementation
phase (mentoring). In addition, we will enable our trainees to teach their newly acquired skills at their
institutions. Our short-term goal is to shape the thinking of biomedical researchers from diverse backgrounds
and equip them with skills and tools to improve the rigor and reproducibility of their research. Our long-term
goal is to have a long-lasting impact on rigor and reproducibility through the transfer of skills from our trainees
to their trainees, improve research outcomes and its benefits to the society, and to strengthen a diverse
biomedical data science workforce.
Research projects with long data manipulation pipelines face rigor and reproducibility challenges throughout
their lifecycle. Despite the efforts of the research community to promote rigor and reproducibility, there lacks
systematic training for researchers to build the technical know-how to achieve this in practice. Our program
will focus on six topics: 1) Ethical issues in biomedical data science. 2) Data management, representation, data
sharing with confidentiality considerations, and metadata. 3) Rigorous statistical design. 4) Design and
reporting of predictive modeling. 5) Reproducible workflow. 6) Meta-analysis.
Our program will support diversity at four levels. Scientifically, we train researchers who use diverse types of
data (from -omics data all the way to population data) to address research questions at various scales.
Professionally, we will train faculty and technical personnel at any career stage. Demographically, we will
ensure that researchers from underrepresented groups have a strong presence in our program, through intense
recruitment effort and by building a friendly learning environment. Institutionally, we will train researchers
from major research universities as well as from institutions with limited resources, and we will especially
welcome researchers from Minority-Serving Institutions.
Our training program will focus on teams of faculty (project PIs) and technical personnel. They both play
critical roles to ensure rigor and reproducibility, but may approach this from different perspectives. Training
them together will allow them to benefit from each other’s scientific expertise and technical skills and address
rigor and reproducibility in a collaborative manner. We will use a combination of training components
(lectures, small group intensive sessions and team projects) through an online adaptive learning tool to
effectively accommodate the highly variable scientific and technical backgrounds of our teams of trainees.