Novel Computational Methods for Microbiome Data Analysis in Longitudinal Study - With the steady growth of longitudinal microbiome studies, microbiomes are now on the cusp of clinical utility for several diseases, including obesity, diabetes, inflammatory bowel disease, and cancer. Motivated by the PI’s broad microbiome collaborations at New York University Langone Health and building upon our extensive and rich experience in developing novel methods to analyze emerging omics data, we propose to develop two sets of novel analytic methods to address two computational and analytical challenges in pushing microbiome research to reach its full clinical potential. In Aim 1, we will take a granular approach to dive into the raw metagenomics sequencing data and investigate how to analytically detect and differentiate closely related microbial strains within species. Specifically, we hypothesize that utilizing longitudinal raw metagenomics sequencing data will produce a more efficient and accurate genetic variants calling scheme than existing approaches, and we will develop a novel longitudinal metagenomics sequencing processing system to capture genomic variants, identify primary and secondary strains, and quantify strain proportions within species. The proposed new tool will be further used to understand how the microbial strains evolve along the time and how to link the structure variations with host-specific traits. In Aim 2, starting from the recognition of the human microbiota as a complex ecosystem, we will take a holistic approach to develop a suite of microbial risk scores to capture the multifaceted characteristics of the microbiome and implement these scores in disease risk prediction in combination with other omics data. In Aim 3, we will apply the proposed pipelines to two finished longitudinal microbiome studies and five on-going large scale population-based cancer microbiome studies. Through the extensive real data analyses, we will validate the proposed methods, illustrate new applications, and explore future directions. In addition, we will develop, distribute to the community, and provide support for open-source software packages implementing these methods. The proposal is innovative because it integrates the overall study design, upstream bioinformatics raw sequencing processing techniques and downstream statistical modeling with clinical outcomes into a streamlined analytic process to produce unbiased and efficient analytic tools for microbiome research in longitudinal studies. The proposed work will be conducted by an experienced multidisciplinary study team. If this work succeeds, it will facilitate the understanding of how bacterial communities affect human health and disease, and ultimately lead to new approaches to treat or prevent a variety of health conditions.