PROJECT SUMMARY
Carbohydrate enzyme gene clusters in human gut microbiome
Hippocrates said ~2,400 years ago: “Let food be thy medicine and medicine be thy food”. It is now well known
that this is largely due to the “diet-microbiota-host” interactions that happen in the human gut. In particular,
microbial degradation of carbohydrates can produce a variety of metabolites, which have a profound impact on
human health. As a bioinformatics researcher in the Nebraska Food for Health Center, the long-term interests of
the PI include: (i) develop specialized computational tools for better functional annotation of food-digesting
microbial genomes and metagenomes, and (ii) characterize enzymes and other genetic elements that connect
microbes, diets, and human health. The objective of this R01 project is to develop a suite of bioinformatics tools
for functional annotation of carbohydrate active enzyme (CAZyme) and CAZyme gene clusters (CGCs) in human
gut microbiome. The PI has over 10 years of experience in CAZyme bioinformatics tool development, and
maintains a well-recognized CAZyme annotation database and web server called dbCAN
(http://bcb.unl.edu/dbCAN2). This project aims to further dbCAN development to address fundamental
personalized nutrition questions: (i) is a gut microbe able to utilize a specific type of glycan? (ii) can a person
carrying certain gut microbes respond to an individualized diet (e.g., prebiotics: dietary compounds that are
beneficial to human health)? To address these questions, new CAZyme annotation tools must have the ability
to predict the carbohydrate substrates of CAZymes.
Recent research has found that different CAZyme encoding genes are often co-localized with each other and
with other genes (e.g., those encoding sugar transporters, regulators, and signaling proteins) in bacterial
genomes to form CGCs (also known as polysaccharide utilization loci or PULs). Thus, the foundation of the new
tool development is that the gene membership (or functional domain composition) of a CGC can be used to
predict its carbohydrate substrates (e.g., xylans, pectins, glucans, etc.). The innovation is that machine learning
approaches will be used to analyze a large number of experimentally characterized PULs curated from literature,
and the extracted sequence features will be used to build effective classifiers to predict and classify CGCs in
new genomes/metagenomes. The expected outcome will be novel and user-friendly open source computer
programs, databases, and web servers that allow automated CGCs identification and substrate predictions. The
significance is that the new tools will facilitate the experimental characterization of more PULs and their
carbohydrate substrates in human gut microbiome (also in other carbohydrate rich environments). Therefore,
this project will contribute computational solutions to the research of personalized nutrition, e.g., analyze a
person's gut microbiome to predict if this person can respond to diets containing certain prebiotic glycans.