With the steady growth of longitudinal microbiome studies, microbiomes are now on the cusp of clinical utility for
several diseases, including obesity, diabetes, inflammatory bowel disease, and cancer. Motivated by the PI’s
broad microbiome collaborations at New York University Langone Health and building upon our extensive and
rich experience in developing novel methods to analyze emerging omics data, we propose to develop two sets
of novel analytic methods to address two computational and analytical challenges in pushing microbiome
research to reach its full clinical potential. In Aim 1, we will take a granular approach to dive into the raw
metagenomics sequencing data and investigate how to analytically detect and differentiate closely related
microbial strains within species. Specifically, we hypothesize that utilizing longitudinal raw metagenomics
sequencing data will produce a more efficient and accurate genetic variants calling scheme than existing
approaches, and we will develop a novel longitudinal metagenomics sequencing processing system to capture
genomic variants, identify primary and secondary strains, and quantify strain proportions within species. The
proposed new tool will be further used to understand how the microbial strains evolve along the time and how to
link the structure variations with host-specific traits. In Aim 2, starting from the recognition of the human
microbiota as a complex ecosystem, we will take a holistic approach to develop a suite of microbial risk scores
to capture the multifaceted characteristics of the microbiome and implement these scores in disease risk
prediction in combination with other omics data. In Aim 3, we will apply the proposed pipelines to two finished
longitudinal microbiome studies and five on-going large scale population-based cancer microbiome studies.
Through the extensive real data analyses, we will validate the proposed methods, illustrate new applications,
and explore future directions. In addition, we will develop, distribute to the community, and provide support for
open-source software packages implementing these methods. The proposal is innovative because it integrates
the overall study design, upstream bioinformatics raw sequencing processing techniques and downstream
statistical modeling with clinical outcomes into a streamlined analytic process to produce unbiased and efficient
analytic tools for microbiome research in longitudinal studies. The proposed work will be conducted by an
experienced multidisciplinary study team. If this work succeeds, it will facilitate the understanding of how bacterial
communities affect human health and disease, and ultimately lead to new approaches to treat or prevent a variety
of health conditions.