DSpace: Utilizing Data Science to Predict and Improve Health Outcomes in Pediatric HIV - Abstract Metabolic Syndrome (MetS) is rapidly increasing in children infected with HIV in sub-Saharan Africa (SSA). According to our preliminary data, 1 in 30 children infected with HIV between the age of 16 and 19 are diagnosed with MetS. In addition, to MetS, tuberculosis (TB) remains a leading cause of morbidity and mortality among HIV-infected children. Moreover, children with HIV have a 30-fold risk of developing TB and a significantly higher risk of death compared to non-HIV-infected children. Clinically, TB in HIV-infected children manifests with extensive heterogeneity (latent TB or active TB [probable, definite, or possible]), which poses a significant diagnostic challenge. The paucibacillary nature of pediatric TB means that only a small fraction of children with a compatible clinical presentation can be bacteriologically confirmed. There have been various efforts to develop data science tools to address patient classification and risk stratification of MetS and improve the diagnosis of TB in adult Western populations. However, these technologies have not been deployed and evaluated in Africa, which bears the biggest burden of people infected with HIV and TB and where the burden of non-communicable diseases is growing rapidly. Furthermore, MetS is a known risk factor for the early development of diabetes mellitus (DM) and cardiovascular disease (CVD) in adulthood. Unfortunately, interventions (either pharmacological or non- pharmacological) that improve metabolic risk factors for children with long-term metabolic impairment (MetS) do not completely prevent or reverse CVD or DM complications, which may be the result of the current timing of interventions which are implemented after metabolic risk factors have been present for many years. Thus, the determination of the longitudinal risk of MetS becomes imperative. Similarly, the availability of multi-omics data presents a valuable opportunity to investigate the host genetics of TB disease in SSA children to advance the development of highly sensitive TB diagnostic algorithms that are much needed. Therefore, the overarching goal of this application is to utilize data science approaches to integrate large temporal electronic health records (EHR) with multi-omics data to predict and improve health outcomes of HIV-infected children in Africa. This retrospective, descriptive longitudinal study will leverage existing data on ~118,000 HIV-infected children from the Baylor International Pediatric AIDS Initiative (BIPAI) programs in Uganda, Botswana and Eswatini. In Aim 1, we will use machine learning to identify informative features within longitudinal EHRs and genomic data to predict MetS in HIV-infected children. We shall also develop composite risk scores for the development of MetS associated with dolutegravir-based combination antiretroviral therapy. Aim 2 of this proposal will focus on the use of explainable machine learning to uncover molecular signatures in multi-omics data as well as characteristic features in temporal EHR that improve the power of predictive models for the diagnosis of TB in HIV-infected children. This effort will translate into developing clinically relevant composite risk scores for the diagnosis of TB and the future development and validation of non-sputum TB diagnostic biomarkers. This application provides a model methodological framework that can be applied to multimodal data in HIV-infected children and improves our understanding of how to effectively use artificial intelligence to target personalized or public health interventions that improve outcomes across the entire spectrum of the HIV continuum care in Africa.