Project Summary
This study aims to (1) establish the degree of representativeness across age, sex, and race of obituary data by
comparing that information with death certificate records to understand open-source data's reliability and
measurement properties. (2) Build a model that uses online obituary data to predict administrative records.
During health care emergencies, it is essential to monitor all-cause mortality and not just cause-specific deaths
and to calculate the number of excess deaths for several reasons: (1) official statistics on cause-specific
deaths might undercount people who did not test positive before dying; (2) hospitals and civil registries may not
process death certificates for several days, or even weeks, which creates lags in the data; (3) the person
completing the death certificate does not have access to the complete medical record or otherwise know about
a positive test or symptoms; (4) pandemic and health emergencies divert attention and resources away from
other conditions (e.g., cancer patients have seen delays and postponing treatment) and discouraged people
from going to the hospital when needed (e.g., strokes), which may have indirectly caused an increase in
fatalities from diseases other than COVID-19. Automated data collection from text mining of openly available
online obituaries could allow us to derive quick predictions of age and sex distribution of death by location in a
cost-effective way. Currently, publicly available datasets have a two-year lag. From the moment death records
are captured to the time these are released, this delay hampers monitoring efforts. Providing information on
sex, age, and race is critical because health emergencies might directly or indirectly cause a disproportionate
increase in fatalities among certain groups. In places where mortality is exceptionally high (or low) based on
obituary data, this form of monitoring can inform the policy response's effectiveness. This work can also be
foundational for disease monitoring should future pandemics arise because online death records are easier
and cheaper to access than administrative data.