Methods for microbiome compositional data - Project Summary The broad and long-term objective of this project concerns the development of novel quantitative methods and biostatistical tools for microbiome data analytics to aid in microbiome-based discovery sciences. The microbiome, also called the second genome of the human, has received much attention in the past few years. Due to its critical roles in human health and disease, the human microbiome has now been recognized as an integral part of the individualized medicine approach because it not only accounts for inter-individual variability in all aspects of a disease but also represents a potentially modifiable factor that is amenable to targeting by therapeutics. Despite those fruitful and promising findings from microbiome studies, there is no consensus in the current field as how to appropriately analyze the data, let alone the optimality and efficiency issues that have yet to be addressed. Several challenges amount to this predicament, including complex experimental designs of microbiome studies, an unknown interplay between microbiome and host, extremely sparsity and high dimensionality of the data, phylogenetic relatedness of the microbial taxa, and compositional structure of microbiome. As a result, although quite a few analytical methods and tools have been developed for microbiome data analysis, several specific gaps exist in the methodological toolbox, hindering the advance of microbiome-based biomedical sciences. To fill these gaps, this proposal aims to develop robust and powerful quantitative methods and tools for microbiome data analysis. Specifically, Aim 1 focuses on developing robust and powerful methods for differential abundance analysis in complex study designs. It will develop new methods to address zero-inflation, compositional effects and correlations in microbiome data. Aim 2 focuses on strategies to increase the power of microbiome-wide multiple testing. It proposes two new multiple testing procedures, which address confounders and phylogenetic relatedness, respectively. Aim 3 proposes to develop compositional canonical correlation analysis methods for integrating microbiome data with other omics data. Specifically, it will develop an efficient and flexible framework for integrating heterogeneous omics data with microbiome data, accounting for compositional effects and phylogenetic relatedness. Aim 4 will develop user-friendly and efficient software packages so the community can benefit maximally from methodological and scientific advances resulting from this application. The proposed methods will be evaluated using simulations, and more importantly, applications to several ongoing microbiome studies in the Center of Individualized Medicine at Mayo Clinic. The proposed quantitative methods and open-source software packages will contribute to microbiome biomarker discovery and microbiome-based mechanistic studies. All methods and tools developed under this grant will be made available free of charge to interested researchers and the public.