Project Summary
Biological knowledgebases are a critical resource for researchers and accelerate scientific discoveries by
providing manually curated, machine-readable data collections. However, the aggregation and manual curation
of biological data is a labor-intensive process that relies almost entirely on professional biocurators. Two
approaches have been advanced to help with this problem: natural language processing (NLP; text mining (TM)
and machine learning (ML)) and engagement of researchers (community curation). However, neither of these
approaches alone is sufficient to address the critical need for increased efficiency in the biocuration process. Our
solution to these challenges is an NLP-enhanced community curation portal, Author Curation to Knowledgebase
(ACKnowledge). The ACKnowledge system, currently implemented for the C. elegans literature, couples
statistical methods and text mining algorithms to enhance community curation of research articles. We propose
to strengthen and expand ACKnowledge by including other species into our pipeline, incorporating more
sophisticated machine learning models, and presenting sentence-level entity and concept extraction for more
detailed author curation. In addition, we will develop an Author Curation Portal (ACP) to allow authors to easily
upload and curate their own documents. Taken together, these enhancements will allow us to maximize
community curation efforts by leveraging author expertise in multiple areas of biology, while at the same time
supporting authors with as much AI-assisted curation as possible. This reciprocal interaction will improve not
only the content of knowledgebases, but the AI methods themselves, as we will receive valuable feedback on our
models. By developing an Author Curation Portal, we will further empower authors to participate in the curation
process and alert knowledgebases to key information that can, and should, be readily discoverable in accordance
with FAIR (Findable, Accessible, Interoperable, and Reusable) data principles.