PROJECT SUMMARY
The functional annotation of proteins is a major bottleneck of biological discovery in the post-genomic era. We
are able to accurately generate large swaths of genomic and metagenomic sequence data. We are also able,
to a lesser extent, to assemble those sequences correctly, and identify open reading frames. Our capability to
accurately generate biological knowledge from genomic data drops precipitously at the third step: assigning the
biological function to proteins. The Critical Assessment of Functional Annotation, or CAFA is a computational
challenge that involves a community of computational biologists, data scientists, ontologists, and biocurators
working together to improve and distribute protein function prediction algorithms. Here we propose (i) to
sustain and enrich the CAFA community of practice by continuing the CAFA challenges, while involving
biocurators and computer scientists not regularly associated with this community. This will be accomplished by
increasing the engagement with other communities and by incentivizing the development of containerized and
Open Source software to be incorporated into continuous use in UniProt; (ii) to drive continuous improvement
in gene function prediction and annotation by transitioning CAFA to a continuous event and by developing
algorithms that prioritize proteins for biocuration and experimental annotation; (iii) to use annotation extensions
and subsequently the Gene Ontology Causal Activity Model to capture causal relationships between the
functionality of proteins, and then challenge the function prediction algorithms to adopt causal annotation
models. This project shifts the field of computational function prediction to drive the accurate annotation of
protein function in a fine-grained, context-dependent, and causal manner.