PROJECT SUMMARY
The transmission or ‘spillover’ of wildlife viruses to humans is a critical threat to global health, with outbreaks of
viral pathogens like filoviruses, paramyxoviruses, and coronaviruses all originating in wild mammals. A key
outstanding question is whether specific taxonomic groups, such as bats, warrant extra surveillance as ‘special
reservoirs’ of viruses that are potentially pathogenic to humans. However, existing host-virus datasets are not
sufficiently resolved to predict fine-grain risk for species or genera. An effective response must therefore
address two core aims: (i) synthesizing knowledge regarding virus-to-mammal interactions; and (ii) using that
knowledgebase to robustly predict future spillover events (i.e., zoonotic risk). To enable robust analysis and
reusability of public datasets of NIAID’s Bioinformatics Resource Center (BRC; especially NCBI Virus and
Virus Pathogen Resources, ViPR), the project will develop Host-Virus Data Intelligence to address three main
problems for data reuse: confidence of the taxonomic assignments of mammals and viruses in observations;
confidence in the evidence for proposed mammal-virus interactions; and connecting all the relevant data in
published texts that are hidden from existing databases. The project team will construct a novel bioinformatic
pipeline that will digitally connect taxonomic knowledge, use it to search dark data to find evidence of potential
host-virus interactions, and then link it together using metadata layers (‘data about the data’) to form a more
expansive host-virus knowledge graph than previously feasible. The project’s computational approach
leverages information extraction methods in natural language processing as well as novel applications of
artificial intelligence methods such as probabilistic inductive logic programming. A key anticipated outcome is
to expand the dataset of host-virus interactions by 3-fold compared to comprehensive existing datasets. The
proposed project will lay the foundation for a new generation of work reusing host-virus interaction data to test
previously inaccessible hypotheses about how species’ traits impact viral spillover to humans. Shifting the
paradigm to graph-based analyses, compared to purely taxonomic representations of host-virus interactions,
will allow researchers to directly investigate the impact of ecosystem structure and human encroachment upon
viral loads. Determining whether all mammals have equal risk of viral spillover, or whether some groups have
higher taxon-specific zoonotic risk (e.g., horseshoe bats, murid rodents), is critical information for public health
workers and epidemiologists. More definitive risk quantification will also help researchers identify which
ecophysiological adaptations predispose certain groups to tolerating more viruses, which may in turn lead to
clinical treatments by modeling the immune responses of wild mammals. Filling the identified gaps in host-virus
knowledge is therefore essential to aid the progress of zoonotic disease research in the wake of COVID-19.