Glycan Utilization Profiling in Human Gut Microbiomes of Common Funds Data - Glycan Utilization Profiling in Human Gut Microbiomes of Common Funds Data Healthy diets are key to prevent various metabolic diseases (e.g., cardiovascular disease, intestinal bowel disease, and obesity). The western diets are known to be unhealthy as it lacks sufficient dietary fibers, which are critical to nurture a healthy gut microbiome. Furthermore, not only the amount but also the types of dietary fibers have a significant impact on the healthy gut microbiome. Personalized dietary intervention, by giving different dietary fibers as prebiotics to different individuals, is an effective strategy to enable personalized nutrition for disease prevention. However, microbiome-based personalized nutrition demands a better understanding and a capability to computationally profiling glycan utilization in gut microbiomes of any human individuals from different populations, lifestyles, and diseases. To fill this research gap, this R03 project aims to develop a bioinformatics workflow to automatically retrieve CAZyme (carbohydrate active enzyme) gene clusters (CGCs) from publicly available human gut metagenomes. These include microbiome data generated in three NIH Common Fund programs: the Human Microbiome Project (HMP), the Integrated Human Microbiome Project (iHMP), and the Human Heredity and Health in Africa (H3Africa) project. Other microbiome data that were not funded by NIH will also be included to have a better representation of more diverse human populations. The genomes from HMP, H3Africa, and other microbiome projects will be used to identify fiber degrading CAZymes and CGCs, forming two reference databases (refCAZymes and refCGCs) that can be used to map sequencing reads from any individual’s microbiome sample to infer personalized fiber utilization. To demonstrate this utility, metagenomic and metatranscriptomic reads of 791 samples of iHMP Inflammatory Bowel Disease Multiomics database (iHMP-IBDMDB) will be mapped to refCAZymes and refCGCs to compare the glycan utilization abundance and prevalence between IBD patients and healthy people. The significance of this project is that it will contribute to a better understanding of the diversified glycan utilization among different human populations, lifestyles, and disease status. The workflow developed in this project will be implemented as a new software package named GLUP (glycan utilization profiling, code and documentation will be on GitHub) using the popular workflow manager Nextflow to facilitate the emerging microbiome-based personalized nutrition and health industry. The innovation is that it will be the first global CGC-based glycan profiling across different human populations, especially in the under-represented and only recently available African microbiomes. This project is built upon our highly cited CAZyme bioinformatics tool suite named dbCAN that has been continuously developed since 2012.