Viral pathogens are an enduring threat to global public health. This project aims to use viral genomic data
to improve understanding of ongoing virus evolution and to make actionable inferences to reduce the global
burden of viral infectious disease. In order to be relevant for public health interventions, analyses of viral
sequence data need to be incredibly rapid, both in terms of computation and in terms of dissemination. To
accomplish these goals, this project will create novel methodological tools to analyze evolutionary dynamics
from in¿uenza genetic sequence data and to analyze transmission patterns from outbreak sequence data.
Over the current project period (2016-2021), we developed a real-time analysis platform called Nextstrain,
which provides up-to-date analyses for a variety of pathogens including in¿uenza virus, Ebola virus, Zika
virus, dengue virus, mumps virus, tuberculosis and SARS-CoV-2. Bioinformatic pipelines developed through
Nextstrain are reusable by academic groups and public health labs and resulting analyses are shareable via the
In the upcoming project period (2021-2026), we will re¿ne methods for forecasting strain dynamics of in¿uenza
virus. Monitoring and forecasting evolution of viral strains is of paramount importance. New antigenic variants
of in¿uenza that partially escape from prior human immunity emerge and rapidly sweep through the viral
population. Such strains are less susceptible to vaccine-derived immunity and so antigenic evolution results in
the need to frequently update the seasonal in¿uenza vaccine. This project aims to re¿ne methods to forecast
strain dynamics and predict the makeup of the future in¿uenza population. This forecasting is especially
relevant to in¿uenza vaccine strain selection, as a vaccine strain is chosen for the Northern Hemisphere in
February for deployment the following winter. Accurate projections will aid in vaccine match for seasonal
in¿uenza viruses and result in improved vaccine ef¿cacy. Technical innovations focus on extending models to
work across different viruses, different gene segments and to incorporate spatial dynamics.
In an outbreak scenario such as the West African Ebola epidemic, the American Zika epidemic or the SARS-
CoV-2 pandemic, the focus of public health interventions focus on early diagnosis, contact tracing, isolation and
treatment. Epidemiological understanding of transmission dynamics is of paramount importance to outbreak
response. Viral genomic data can reveal otherwise hidden transmission patterns and aid in ef¿cient contact
tracing. Geographic spread is especially amenable to genomic inferences. This project will develop tools to
make epidemiological inferences from outbreak sequence data. These methods will continue to be deployed
via the Nextstrain platform, allowing epidemiologists throughout the world to analyze their own datasets.
Genomic epidemiology has the potential to truly inform outbreak response. Nextstrain has been instrumental
to SARS-CoV-2 genomic epidemiology in the United States and world. Improvements to the accuracy and
capabilities of the platform would be well placed.