Project Summary
The broad and long-term objective of this project concerns the development of novel quantitative methods and
biostatistical tools for microbiome data analytics to aid in microbiome-based discovery sciences. The
microbiome, also called the second genome of the human, has received much attention in the past few years.
Due to its critical roles in human health and disease, the human microbiome has now been recognized as an
integral part of the individualized medicine approach because it not only accounts for inter-individual variability
in all aspects of a disease but also represents a potentially modifiable factor that is amenable to targeting by
therapeutics. Despite those fruitful and promising findings from microbiome studies, there is no consensus in
the current field as how to appropriately analyze the data, let alone the optimality and efficiency issues that
have yet to be addressed. Several challenges amount to this predicament, including complex experimental
designs of microbiome studies, an unknown interplay between microbiome and host, extremely sparsity and
high dimensionality of the data, phylogenetic relatedness of the microbial taxa, and compositional structure of
microbiome. As a result, although quite a few analytical methods and tools have been developed for
microbiome data analysis, several specific gaps exist in the methodological toolbox, hindering the advance of
microbiome-based biomedical sciences. To fill these gaps, this proposal aims to develop robust and powerful
quantitative methods and tools for microbiome data analysis. Specifically, Aim 1 focuses on developing robust
and powerful methods for differential abundance analysis in complex study designs. It will develop new
methods to address zero-inflation, compositional effects and correlations in microbiome data. Aim 2 focuses on
strategies to increase the power of microbiome-wide multiple testing. It proposes two new multiple testing
procedures, which address confounders and phylogenetic relatedness, respectively. Aim 3 proposes to
develop compositional canonical correlation analysis methods for integrating microbiome data with other omics
data. Specifically, it will develop an efficient and flexible framework for integrating heterogeneous omics data
with microbiome data, accounting for compositional effects and phylogenetic relatedness. Aim 4 will develop
user-friendly and efficient software packages so the community can benefit maximally from methodological and
scientific advances resulting from this application. The proposed methods will be evaluated using simulations,
and more importantly, applications to several ongoing microbiome studies in the Center of Individualized
Medicine at Mayo Clinic. The proposed quantitative methods and open-source software packages will
contribute to microbiome biomarker discovery and microbiome-based mechanistic studies. All methods and
tools developed under this grant will be made available free of charge to interested researchers and the public.