Project Summary/Abstract
Allogeneic stem cell transplantation is a life-saving therapy for a variety of blood disorders, but its use is limited by a high
rate of serious side e¿ects, including the development of graft-versus-host-disease (GVHD). The gut microbiome, or the
composition of microorganisms populating the digestive tract, plays a key role in triggering this in¿ammatory response,
and there is an urgent need to analyze patient microbiome pro¿les to both predict and mitigate risk of GVHD. However,
microbiome data pose a number of statistical challenges not addressed by existing methods due to high dimensionality,
heterogeneity across subjects, and complex phylogenetic relationships. In this proposal, we develop new data science
approaches to make sense of microbiome data, providing insight that can guide the development of future interventions
aimed at reducing GVHD incidence. We will develop accurate and e¿cient methods for microbiome data analysis and
make them available in user-friendly formats. We focus on the development of novel methods for visualization and
prediction using microbiome data, as detailed in the following speci¿c aims:
Speci¿c Aim 1: To develop and evaluate advanced tools for visualization of microbiome data. The high
dimensionality and unique structure of microbiome data present challenges to e¿ective data visualization. In this aim,
we will develop approaches for both unsupervised and supervised visualization of microbiome data, along with an RShiny
app and QIIME2 plug-in that will make these tools accessible to both clinicians and bioinformaticians. The methods and
software resulting from this aim will provide robust approaches to enable researchers to better visualize global microbiome
heterogeneity across their study population, enhancing data exploration and identi¿cation of potential confounding factors
or outliers.
Speci¿c Aim 2: To develop predictive modeling approaches for binary and survival outcomes. In this aim, we
will focus on selection of predictive microbiome features in the context of regression. We will carry out key advances
enabling the e¿ective application of sparse modeling to predict GVHD risk: novel statistical approaches to handle binary
and time-to-event outcomes, including those with competing risks, and computationally e¿cient implementations, to be
made freely available as both an R package and RShiny application.
Speci¿c Aim 3: To develop methods for understanding the impact of rare features. Current microbiome pro¿ling
methods allow for very ¿ne resolution of the strains present in each sample. In this aim, we propose two methods to
understand the impact of rare features. We will ¿rst develop a method to provide insight into kernel association results, by
obtaining estimated e¿ect sizes for individual microbiome features. We will then develop an approach for nonparametric
clustering of the regression coe¿cients, which allows ¿exible aggregation of the observed rare features.
Successful completion of this work will result in new statistical and computational approaches to provide insights into
microbiome data, generating hypotheses that can guide the development of future strategies to predict and mitigate
GVHD. These methods will be disseminated through easy-to-use and e¿cient cloud-based software implementations.