Abstract:
Tuberculosis (TB) is prevalent in Uganda, and overlaps with an already high burden of HIV/TB coinfection. While
almost all hospital-based TB cases in Kampala city, the capital of Uganda, have clear TB symptoms, 30% or
more of the people with undiagnosed TB, identified through active case finding, are asymptomatic for TB;
moreover, the host risk factors for TB in Kampala cannot be distinguished from risk factors associated with the
environment. Complicating this further is the fact that anti-TB treatment failure rates are higher in Uganda by
several order of magnitude, compared to global estimates (17% vs. 10%). These TB-specific challenges depict
only a fraction of the complexity underlying the disease, especially in endemic settings with a high burden of
HIV/AIDS. Data science methods, especially Artificial Intelligence (AI) and/or Machine Learning algorithms, can
unravel such complexity and untangle factors of the host, pathogen and environment underlying TB, which
hitherto, have been difficult to explain or predict with conventional approaches. In this proposal, we will harness
health data science and elucidate factors underlying transmission of TB in a household, as well as anti-TB
treatment failure. We will leverage the computational infrastructure at Makerere, and available demographic,
clinical and laboratory data sets from TB patients and their contacts, and develop AI/Machine Learning
algorithms that identify: (1) Patients at baseline (month 0) who would not sputum and/or culture convert at months
2 and 5, hence are at risk of failing TB treatment, (2) Contacts of index-TB cases who are at risk of developing
household TB disease, as well as contacts who could be resistant to TB infection despite persistent and/or
multiple exposure to M. tuberculosis in a household. Answering these aims provides the required evidence that
data science methods are effective at early identification of potential TB cases and high-cost patients, hence
contribute to halting of TB transmission in the community.