ABSTRACT
GlyGen is a maturing (five years old) knowledgebase that accumulates data in the glycobiology domain and
connects it with other data types. GlyGen is unique; no other present or prior informatic resource has undertaken
such an integrative mission. In part, the lagging growth of accessible knowledge in the glycobiology domain,
compared to other -omics or biomedical research fields, reflects the inherent complexity of glycan structures,
which, unlike genes and proteins, exist in branched and isomeric forms whose biosynthesis is not attributable to
well-characterized, template-driven processes such as transcription or translation. Rather, glycan biosynthesis
is mediated by the regulated expression of ensembles of glycosyltransferases, substrate transporters, and
secretory pathway regulatory mechanisms that together generate dynamic cell- and tissue-specific patterns of
protein and lipid glycosylation. In addition, each glycosylation site on a glycoprotein may routinely be modified
by one of an ensemble of glycan structures, a glycoprotein feature called microheterogeneity. Importantly,
microheterogeneity is not random, but reflects the intrinsic biosynthetic capacity of specific cells and tissues and
may be modified by disease. These structural and biosynthetic complexities are essential contributors to the
tissue- and disease-specific functions of glycans and glycosylation, and, therefore, need to be captured and
represented in knowledgebases in a way that they can be queried and linked to other types of data. GlyGen
aims to expand its underlying data model to accommodate new and more complex datatypes, augmenting and
integrating new data types, and implementing robust modeling, unified procedures, and tools to improve
discovery and exploration of glycan and glycoconjugate data. Enhancement of the overall resource functionality
will be achieved through front-end improvements to accommodate user preferences and ensure exceptional data
communication and visualization. Improving the interconnectivity of GlyGen and its partner databases as well as
enhancing data-sharing across resources will continue to be core principles of the GlyGen project. All resulting
harmonized data will be available through highly permissive licenses for easy integration into other resources,
such as NCBI, EBI, SIB and other international efforts, as well as for easy repurposing by independent
researchers, educators, bioinformaticians, and commercial entities. By the end of the next project period GlyGen
expects to become the go-to, well-integrated resource for glycoscience data, similar to existing protein and
genomic resources and serving the same broad community of biomedical researchers.