SARS-CoV-2 is now a global pandemic with 4.2M cases and 290K deaths worldwide (as of May 12, 2020).
In the United States, there are over 1.3M cases and 81K deaths. Locally, Arizona has over 11K cases and 562
deaths. In response to this public health emergency, several studies have been published that describe
patient characteristics in terms of signs, symptoms, and clinical endpoints. In addition, epidemiologists and
infectious disease researchers have utilized next-generation sequencing technology to produce complete
genomes of the virus for clinical and epidemiologic investigation. Genomic epidemiology has enabled scientists
to understanding localized transmission while determining geographic sources of introductions from different
states and countries. However, most of the sequencing for SARS-CoV-2 (as well as for other viruses) is
performed outside of state or local health departments such as the Centers for Disease Control and Prevention
(CDC), universities, or private labs. It can then be difficult to link the pathogen, once sequenced, back to the
data collected by the health department for case investigation. This can inhibit genomic epidemiology when
there is no link between sequences of viral isolates and epidemiologic case data.
There is limited research in how to link pathogen sequences to epidemiologic case data; especially for
COVID-19. Thus, despite the abundance of clinical and epidemiologic data collected during this pandemic,
more informatics research is needed to understand how to link viral genetic and epidemiological data and
demonstrate the value of this for disease surveillance.
The goal of this supplement is to link epidemiologic data from COVID-19 positive patients in Arizona with
viral genetics from sequenced isolates to better understand the relationship between viral genetics and
epidemiologic and clinical phenotypes. We will accomplish this by utilizing Arizona’s disease surveillance
system and available sequences and metadata that are published in online nucleic acid databases. We will use
different probabilistic matching strategies to link the two different sources (Aim 1) and then use Bayesian
phylogenetics and phylogeography to study clustering of epidemiologic cases (Aim 2). Epidemiologists can use
these findings to gain an understanding of how local viruses genetically cluster in relation to specific
epidemiologic and clinical cases. While disease severity is dependent on individual immune response and
environmental factors, linking viral genetics to its proper epidemiologic case could also support hypothesis
generation for future reverse genetics and immunological studies in animal models.