Abstract
Metabolic Syndrome (MetS) is rapidly increasing in children infected with HIV in sub-Saharan Africa (SSA).
According to our preliminary data, 1 in 30 children infected with HIV between the age of 16 and 19 are
diagnosed with MetS. In addition, to MetS, tuberculosis (TB) remains a leading cause of morbidity and
mortality among HIV-infected children. Moreover, children with HIV have a 30-fold risk of developing TB and
a significantly higher risk of death compared to non-HIV-infected children. Clinically, TB in HIV-infected
children manifests with extensive heterogeneity (latent TB or active TB [probable, definite, or possible]), which
poses a significant diagnostic challenge. The paucibacillary nature of pediatric TB means that only a small
fraction of children with a compatible clinical presentation can be bacteriologically confirmed. There have
been various efforts to develop data science tools to address patient classification and risk stratification of
MetS and improve the diagnosis of TB in adult Western populations. However, these technologies have not
been deployed and evaluated in Africa, which bears the biggest burden of people infected with HIV and TB
and where the burden of non-communicable diseases is growing rapidly.
Furthermore, MetS is a known risk factor for the early development of diabetes mellitus (DM) and
cardiovascular disease (CVD) in adulthood. Unfortunately, interventions (either pharmacological or non-
pharmacological) that improve metabolic risk factors for children with long-term metabolic impairment (MetS)
do not completely prevent or reverse CVD or DM complications, which may be the result of the current timing
of interventions which are implemented after metabolic risk factors have been present for many years. Thus,
the determination of the longitudinal risk of MetS becomes imperative.
Similarly, the availability of multi-omics data presents a valuable opportunity to investigate the host genetics
of TB disease in SSA children to advance the development of highly sensitive TB diagnostic algorithms that
are much needed. Therefore, the overarching goal of this application is to utilize data science approaches to
integrate large temporal electronic health records (EHR) with multi-omics data to predict and improve health
outcomes of HIV-infected children in Africa. This retrospective, descriptive longitudinal study will leverage
existing data on ~118,000 HIV-infected children from the Baylor International Pediatric AIDS Initiative (BIPAI)
programs in Uganda, Botswana and Eswatini. In Aim 1, we will use machine learning to identify informative
features within longitudinal EHRs and genomic data to predict MetS in HIV-infected children. We shall also
develop composite risk scores for the development of MetS associated with dolutegravir-based combination
antiretroviral therapy. Aim 2 of this proposal will focus on the use of explainable machine learning to uncover
molecular signatures in multi-omics data as well as characteristic features in temporal EHR that improve the
power of predictive models for the diagnosis of TB in HIV-infected children. This effort will translate into
developing clinically relevant composite risk scores for the diagnosis of TB and the future development and
validation of non-sputum TB diagnostic biomarkers. This application provides a model methodological
framework that can be applied to multimodal data in HIV-infected children and improves our understanding
of how to effectively use artificial intelligence to target personalized or public health interventions that improve
outcomes across the entire spectrum of the HIV continuum care in Africa.