Given the unprecedented abundance of increasingly complex and voluminous data across many domains of
health, data scientists could play a transformative role in exploiting the big data revolution to address the multi-
pronged health challenges in sub-Saharan Africa. However, there is a severe lack of well-trained data scientists
and home-grown educational programs to enable context-specific training. We propose to advance public health
research in Eastern Africa by establishing new multi-tiered training programs in health data science, with initial
focus on Ethiopia and Kenya due to well-established partnerships and demonstrated needs. A partnership
between Columbia University (CU, USA), Addis Ababa University (AAU, Ethiopia) and University of Nairobi
(UofN, Kenya) will leverage world class strengths in data science at CU to enhance the overall capacity in
Ethiopia and Kenya by building upon the readiness and national prominence of AAU and UofN. Using in-person
and distance modes of training, we will (i) develop new context-specific MS programs in public health data
science, designed to be sustainable well beyond the funding period; (ii) undertake a faculty mentoring program
to build and strengthen capacity in health data science for promising Eastern African scientists; and (ii) conduct
a short-term training program structured around targeted short courses and workshops for a wide spectrum of
trainees. The faculty mentoring mechanism will initially start with partnerships between CU and East African
faculty, and will progress into groupings across the three institutions. The skills developed through this program
will in turn strengthen the overall training and research capacity in data science. To broaden the reach into the
scientific and policy community, the short-term training will engage trainees from partnering governmental and
non-governmental stakeholders and the private sector. The program will leverage several ongoing research
projects led by team members or affiliated partners on environmental health, exposure assessment, remote
satellite data, occupational exposures, climate change, infectious diseases, health surveillance, and health
system monitoring and evaluation, which will be used as immersion opportunities to enable hands-on experience
with new data science techniques for trainees. Evaluation and monitoring will track the success of the training
programs and of the trainees’ achievement of their development goals, successful completion of the research
training, scientific presentations and publications, and the sustainability and growth of the MS degree programs.
In Year 5, we will broaden the training program to the wider East Africa region through sharing of curricula and
inviting trainees for engagement. We will also explore the feasibility of incorporating the courses we have
developed into existing PhD curricula or creating new PhD programs in public health data science. Beyond the
educational programs and collaborations, our project is designed to cultivate long-term regional collaboration,
lifelong learning skills, and a supportive community of researchers committed to open science, algorithmic
fairness, and “data science for good,” ultimately leading to better public health practice.