PROJECT SUMMARY. Due to genomic technologies, electronic medical records, and digitized high-throughput
experiment readouts, building an independent biomedical research career requires fluency in both biomedical
data generation and data science methods for analyzing large-scale biomedical datasets. This dichotomy is
challenging to address in doctoral training; biology doctoral students may take quantitative coursework with no
emphasis on biomedical data, while computational biology students often focus on one data type as end users.
The objective of our Predoctoral Training Program in Biological Data Science at Brown University is to turn “I-
shaped” predoctoral students — with strength in one discipline — into “pi-shaped” Biological Data Scientists with
fluency in two languages: (1) generating biological data motivated by questions across a range of scales and
systems, and (2) developing quantitative methods for modeling and testing hypotheses using large-scale bio-
medical datasets. The established Biological Data Science training community at Brown University has 34 en-
gaged faculty mentors across multiple disciplines who will jointly and actively mentor a steady state of six NIH-
supported predoctoral trainees during the first and second years of doctoral study (resulting in >30 Biological
Data Scientists over 5 years) in a variety of didactic, research, and mentoring activities, as well as in research
and professional development events that continue to foster interdisciplinary community for senior trainees.
These activities will include coursework in inference for genomics and molecular biology, laboratory practicums,
computational workshops, a year-long second-year graduate seminar focused on extensive peer review of meth-
ods for analyzing biological data, an annual program retreat, and a series of professional development events
for interdisciplinary researchers. The resulting community will promote the development of professional skills
essential for interdisciplinary biological data science research, including an emphasis on the ability to communi-
cate science to both broad and field-specific audiences, navigate interdisciplinary collaboration and grant appli-
cations, interview for academic and industry-based research careers, and conduct reproducible and open bio-
logical data science research. The faculty mentors’ research programs cover multiple biological organisms, sys-
tems, and problems, ranging across biological and neuronal networks, computational biophysics, computer vi-
sion and visualization, evolutionary and statistical genetics, functional genomics, host-pathogen interactions, the
microbiome, and the molecular biology of aging. These activities will expand successful activities funded under
a previous NIGMS T32 FOA (now in year 5 of funding), resulting in the persistence of 23 trainees in biomedical
research, including 15 women and 7 students from groups historically underrepresented in biomedical research.
These trainees have secured external fellowships and produced 42 peer-reviewed publications and 8 preprints
under review thus far. The mentors have a combined annual research funding base of over $24 million in direct
costs this year, offering a strong foundation to bolster this innovative interdisciplinary training program.