Project Summary
Influenza has a significant global impact on public health each year even after the COVID-19 pandemic. In the
United States, the illness affects over 35 million people, causes 710,000 hospitalizations and about 47,000
deaths per year. The strongest public health response to this is the annual vaccine, planned and
manufactured several months ahead of the flu season. Globally, the World Health Organization makes
recommendations for vaccine content and national agencies select the most appropriate strain. The
effectiveness of the vaccine varies from year-to-year and even within the same season. Vaccine effectiveness
(VE) data guides the response of state and local public health agencies to influenza epidemics and
pandemics. VE estimates impact the success of vaccination campaigns, allow agencies to estimate the
number of illnesses, hospitalizations, and deaths caused by influenza, and to implement targeted public
health control measures and outreach campaigns if the VE is low. The CDC estimates VE annually through
the Influenza Vaccine Effectiveness Network (US Flu VE Network). While the CDC's efforts to track flu cases
(FluView) and vaccination rates (FluVaxView), collect data continuously from hundreds of sites, the US Flu
VE Network runs only at participating clinics in a limited number of states, with each site enrolling around
1,000 participants with influenza-like illness (ILI) each year as a part of a test-negative design. The CDC
estimates are the US gold standard but have two main limitations: (i) they include only a limited number of
states and subjects, and (ii) the interim report is not published until late into the flu season. Here, we propose
to use social media (SM) data for addressing these limitations. SM is abundantly available across the US in
near real-time and can be used as complementary data for calculating VE. Based on separately funded work,
we have already collected suitable Twitter user datasets and we propose to develop automated methods to
identify those that report taking a flu test or a diagnosis of flu. For these individuals, we will analyze their
tweets over time to determine vaccination status, test results, and demographic information. We have shown
that SM data collected using our systematic Natural Language Processing (NLP) approach can be used for
epidemiology and, per the latest Pew report, is representative of the population. Our specific aims of this
project are to: (1) develop and evaluate an NLP framework to calculate influenza VE including analysis of
timelines for concept extraction relevant to VE and (2) develop and evaluate a real-time VE estimation system
that uses longitudinal SM data and accounts for biases, uncertainty, and missing data in vaccination status or
influenza diagnosis. Real-time, early VE estimation as we propose could aid preparedness from public health
authorities and clinicians, potentially reducing influenza morbidity and mortality. If successful, this will be the
first automated approach to near real-time estimation of VE in the United States, providing a viable, relatively
low-cost alternative solution to a significant problem.