Early detection and inference for emerging infectious agents in data-sparse settings - PROJECT SUMMARY/ABSTRACT In the last few decades, novel pathogens have emerged at an unprecedented rate, causing significant health and economic burdens in our society. However, our ability to detect and quantify key epidemiological features of these pathogens has been limited during the early phase of outbreaks – a critical but short time window to understand new diseases. This gap arises primarily due to a lack of observational data and incomplete detection of transmission events, constrained by the resources invested in traditional surveillance systems. Our research group has been at the forefront of developing mathematical models and computational tools to advance the methodology for surveillance, inference, and forecasting of emerging infectious agents, including SARS-CoV-2, pandemic influenza, and antimicrobial-resistant organisms (AMROs). Our goals for the next five years are to develop novel computational tools to detect early signals of cryptic transmission and infer critical epidemiological features in data-sparse settings. Leveraging theories in complex systems and advanced techniques in machine learning and deep learning (ML/DL), we propose to synergistically use traditional (e.g., syndromic surveillance, PCR tests, sequencing, human mobility, and contact tracing) and non-traditional (e.g., wastewater, social media, and search queries) data sources to conduct a series of studies centering on three key questions. 1). How can we detect early transmission of emerging infectious diseases? We will take a biomimicry approach, inspired by the mechanism of the human physical sensation system to detect external stimuli (e.g., light, sound, and pressure) spanning several orders of magnitude in intensity. We will develop and optimize excitable sensor networks that collectively assimilate a multitude of data sources in different locations to assess the transmission potential for novel pathogens. 2). How can we infer key epidemiological features of novel pathogens using limited data? We will combine process-based models (e.g., metapopulation and agent- based models) and ML/DL techniques (e.g., Graphical Neural Networks and Transformers) to identify signatures of disease characteristics, such as asymptomatic shedding and superspreading, using early-stage data. 3). How can we validate model-derived hypotheses to reduce uncertainty? Determining characteristics of novel pathogens is a high-stakes task. We will develop strategies (e.g., sampling in specific locations at certain times) to cost-effectively validate hypotheses on spatial spread of new viruses and gather new evidence in real time to reduce uncertainty. For all three projects, the developed methods will be applied to a range of infectious agents with disparate epidemiology, including SARS-CoV-2, influenza, AMROs, and zoonotic viruses. We will apply new methods to retrospectively acquired data and quantify their advantage over classical approaches (e.g., how much data is needed with classical versus new methods). Our vision of the research program is that the synergistic use of computational tools and diverse data sources can enhance our ability to perform timely and impactful research to understand novel pathogens using sparse and imperfect data.