The Molecular Signatures Database: A knowledgebase for gene set based analysis of genomic data - Project Abstract The Molecular Signatures Database (MSigDB) knowledgebase, which we introduced almost 20 years ago, is an open access resource of well annotated Human and Mouse gene sets. It is used with multiple gene set based genomic analysis methods to elucidate the biological mechanisms associated with disease and other biological phenotypes, generating hypotheses for further study and experimental validation. As of January 2024, MSigDB provides over 50,000 expertly annotated sets of genes that share common biological function or regulation. MSigDB’s support of a pathway- and biological process-centric view of data analysis has a history of biomedical impact significantly contributing to the study of important questions in biology and medicine across many domains. The impact and popularity of MSigDB are evidenced by its large citation count (>34,000 citations in Web of Science); its continually growing user community (>350,000 registered users from >70 countries worldwide); and its highly used portal (hundreds of thousands of page hits per week). We derive the gene sets in multiple ways: through manual curation of results in scientific publications; computational analysis of publicly available transcriptomic data sets; and mining and curating of public pathway and ontology databases and other public resources. The sets are provided in multiple gene identifier namespaces, using the latest versions of standards established by community-recognized authorities. They are continually reviewed and updated. As we continue to grow the knowledgebase, we prioritize thoughtful selection of new sources and development of gene sets with careful and expert annotation. The sets are available both as investigator- friendly gene set webpages which include information on the origin of the set and biological annotation, as well as machine readable forms for use by software developers and bioinformaticians for programmatic access, testing of new methods, and inclusion in other resources. The interdisciplinary MSigDB team includes computational and genomic scientists, a scientist curator with a PhD in biology, computer science experts in natural language processing, and experienced software engineers. We seek funding of MSigDB for the next five years to continue our support of the large community of biomedical investigators who depend on it for their work. We will: 1) continue to thoughtfully expand the content of MSigDB and keep it up to date with semiannual releases; 2) enhance and optimize the MSigDB build and quality assurance process; 3) explore the use of large language models (LLMs) to assist in gene set annotation; 4) perform a major update of the MSigDB portal architecture and implementation; 5) enhance the MSigDB tools for gene set exploration; 6) continue our high-value user support and increase community engagement.