WormBase: a core data resource for C. elegans and other nematodes - Project Summary We will continue to develop WormBase, a broadly and often daily-used knowledgebase of information about the C. elegans genome, genes, sequence features, gene function, gene interactions, and related information. C. elegans is a premier research organism with about 1500 registered laboratories worldwide who use the short generation time, complete genome, efficient genome editing, defined anatomy and neuroanatomy to study a wide range of biomedical and fundamental topics. WormBase also curates, stores, and displays information about nine other nematode genomes of biomedical importance. We will continue to develop necessary ontologies and gene nomenclature to support systematic annotation of the genome and gene function and expression. After 20 years of independent infrastructure development, we will now use the Alliance of Genome Resources infrastructure for data ingest, storage, efficient curation, and presentation via download, API, and web portal. We will complete the migration of the software infrastructure by the second year. This project will focus on curation of genome scale datasets and individual experiments from the literature as well as storage and display of C. elegans- or nematode-specific data. A major challenge is the increased published data and datasets and decreased staff, which we will proactively address by streamlining and making our systems more automated and high throughput. Our main strategies for scaling curation are by increased automation, namely machine learning (ML) and artificial intelligence (AI); and by community input powered by ML/AI, also incentivized by microPublication-based reviews of pathways and genes. As we are trying to scale, while maintaining our very high-quality data collection (which is re-used by many other bioinformatic resources), professional biocurators with a deep understanding of the biology and researchers needs will increasingly focus on data modeling, quality control, development and training of automated systems, and supporting community curation. We will curate information directly tied to nucleic acid sequence including the genome sequence; sequence features such as gene structure models, regulatory regions, variants, sequence-based reagents, genome-scale experiments; and gene expression including reporter gene assays and RNA-seq, sc-RNA-seq. We will curate information centered on gene function including phenotype of variants and perturbations, disease models, genetic and physical interactions, Gene Ontology (GO) annotations, and pathways using GO-Causal Activity Models. After we transition computational infrastructure to the Alliance, we will continue to curate datasets unique to C. elegans and add them to the Alliance infrastructure. We will support researchers by a 24/7 help desk, which provides advice and often analysis; curation, storage, and display of worm-specific datasets; provision of customized analysis tools; and a community forum. For new data, we will specify software requirements for development at the Alliance.