Tree-based population genetics methods for genetic epidemiology - PROJECT SUMMARY With the advent of high-throughput sequencing, pathogen genetic data have become an increasingly important source of information in the study and surveillance of infectious diseases. However, current computational methods for genetic epidemiology do not adequately capture the complexities inherent in the genetic consequences of realistic disease dynamics. This limitation hinders our ability to fully exploit this rich data source. The core challenge involves incorporating epidemic, ecological, and evolutionary factors, along with their interactions, into the computational framework. My lab’s central aim for the next 5 years is to address this challenge by developing novel statistical and computational methods, drawing on our expertise in population genetics, applied mathematics, and computation. First, we will develop efficient coalescent-based phylodynamic methods to jointly infer genealogies and model parameters based on pathogen genetic data while incorporating realistic biological and epidemiological processes associated with latent and polyclonal infections. We will devise new algorithms and inference frameworks capable of handling the inherent complexities in these scenarios based on the seedbank coalescent and metapopulation coalescent theories, respectively. Beyond the applications in genetic epidemiology, the newly developed methods will also provide a fundamental understanding of populations undergoing dormancy and metapopulation dynamics. Second, we will develop scalable inference frameworks for phylodynamics using pathogen genealogies (representing evolutionary and epidemiological relationships among samples) as an input data structure. Our approach will include integrating tree encoding with deep learning techniques and ensemble learning strategies to handle large datasets and model complexities, enabling a more robust and comprehensive framework for genetic epidemiology. Third, we will create a comprehensive epi-eco-evolutionary simulator that will be integral to generating synthetic data that accurately reflect real-world scenarios, thereby facilitating the development and testing of new hypotheses, algorithms, and models. Importantly, this tool will directly address the currently understudied epi-eco-evolutionary coupling, offering insights into how the genetic evolution and transmission dynamics of pathogens are intertwined. Finally, we will apply the conceptual and methodological advances from our research to the existing whole- genome sequencing dataset of Mycobacterium tuberculosis, the causative agent of tuberculosis. In summary, my research program will provide a deeper mechanistic understanding of how epidemiological, ecological, and evolutionary processes and their interplay shape the genetic diversity and epidemic trajectories of pathogens. This information will lay the foundation for improving the management and control of infectious diseases, such as tuberculosis, which disproportionately affects socioeconomically underprivileged populations.