PROJECT SUMMARY
The current pandemic caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has led
to global public health concerns. This novel coronavirus disease (COVID-19) shares similar clinical symptoms
with diseases caused by other viruses in the coronavirus family and other common respiratory viruses. When
an infectious agent replicates within a host organism, the host interacts with, and responds to the virus with
various mechanisms. Given the varying severity across patients and emergence of new SARS-CoV-2 variants,
there is an urgent need to understand how the host responds to COVID-19 and its variants. RNA sequencing
(RNA-seq) data that profile transcriptional response to SARS-CoV-2 and other respiratory viral infections are
available from public databases. Comparing the gene signatures across respiratory viruses will identify
similarities and differences of how the host responds to these infections. In particular, compendium analyses in
which multiple datasets are integrated bring great opportunities for generating novel biological hypotheses.
However, compendium analysis of RNA-seq data generated across different laboratories is an onerous task
given the different protocols, parameters, software and software versions used at the time of analyses.
This proposal focuses on the development of software tools to facilitate re-analyses of existing host-response
RNA-seq data to create a compendium of gene signatures using the same set of analytical tools and input
parameters. Our deliverables will include workflows with saved input files and parameters, fixed software
versions and dependencies that will facilitate reproducibility and collaboration. We will provide an accessible
graphical user interface that allows users to create custom signature sets by querying the data and if desired,
re-analyzing the data using one of our provided workflows or a workflow of their own choosing. Users will be
able to filter biological variables, perform cross species analysis, compare gene signatures to other gene set
repositories. In addition, we will create an accessible dashboard that will support the query, download,
visualization and reproducible analysis of gene expression data from SARS-CoV-2 and other common
respiratory viruses. Tools will be provided to allow the user to interactively visualize the data and inform the
choice of appropriate gene signatures. Not only will our software tools and dashboard provide an accessible
front end, we will also develop an easy-to-use, scalable and cloud-enabled backend that enables efficient
alignment of sequencing data. Our proposed project will empower biomedical scientists to experiment with
different computational methods, input parameters (including the alignment step) across multiple datasets and
respiratory viral infections. Thus, facilitating integrated and interactive analyses using datasets generated by
multiple laboratories to advance our understanding of host transcriptional response to COVID-19.