Summary: Dissecting host-pathogen interactions through the lens of genomics
Current investigation of mechanisms underlying many diseases relies on the acquisition of multi-dimensional
genomics data. The utility of these data is, however, offset by the lag in development of tools and models to fully
interrogate them. In the context of infectious diseases, such data contains molecular information including gene
transcription, regulation, and variations from both the infecting pathogen and the host cell, providing a snapshot
of the host and pathogen interactions (HPIs). These HPIs determine infection outcomes. For instance, when a
pathogen evades, or evolves resistance to defensive host immunity via a multifaceted HPI, it can result in
persisting infection, chronic inflammation, malignant transformation, and/or elevated mortality. Recent successes
in overcoming immune-evasion of infected tumor cells with checkpoint inhibitors exemplifies the clinical gains
that can be made by identifying and specifically targeting essential mechanisms of HPIs. Hence, precisely
identifying new mode(s) of HPIs is critical for development of effective and personalized interventions.
The molecular mechanisms of HPIs underpinning disease can be identified from genomics data. For example,
information on whether a transcription factor (TF) regulates genes from either host or pathogen, or both, can be
captured by chromatin immunoprecipitation (ChIP) sequencing of infected host cells. This means that integrative
analysis of genome-scale data can provide a platform for large-scale and unbiased detection of often multi-
dimensional and novel facets of HPIs in host cells. However, there is a lack of data mining tools and models to
extract such information. More importantly, the available analysis tools typically focus on data from either the
host or the pathogen and not on the interactions occurring between the two, excluding us from investigating the
full HPI spectrum. Thus, novel methods to determine HPIs by simultaneously modeling both host and pathogen
data are critical for understanding key cellular mechanisms and developing treatment strategies.
My lab specializes in developing computational models to construct HPI maps and to experimentally validate
them. As proof-of-principle, we produced a comprehensive HPI map from sequencing samples from large
numbers of tumors caused by Epstein–Barr virus. This map delivered unprecedented insights, identifying novel
viral integrations, mutations linked to viral reactivation and providing molecular classification of tumors expected
to yield individualized cancer therapy. Therefore, my lab is uniquely positioned to uncover mechanistic insights
from HPIs. Our program seeks to develop new models and machine learning tools to construct HPI maps in
several diseases by focusing on the following major questions: 1) how do expression, integration, and mutational
landscapes of host and pathogen affect pathogenesis of disease?; 2) what is the nature of physical HPIs and
cross-regulation by major host and pathogen factors that modulate gene expression, such as TFs and RNA
binding proteins?; 3) how do HPIs define molecular subtypes to guide personalized treatments? We expect to
identify novel HPIs and provide systems-level understanding of mechanisms critical to cell biology.