Project Summary/Abstract
The Streamlined capture and curation of unpublished data project will establish a new data
capture and dissemination paradigm that automatically and simultaneously captures and ingests
biomedical data into authoritative repositories and publishes them in an online, open access
journal `Micropublication: biology'. This new platform will introduce a curation paradigm shift,
allowing authors to directly submit the output of their research into pre-designed intelligent web
forms. Upon submission, these forms will seamlessly integrate, atomize, and submit metadata
into authoritative data repositories enhancing the efficiency and accuracy of curation.
Simultaneously, the process will automatically generate a `publication-like' PDF file that will be
publishable and citable according to findable, accessible, interoperable and reproducible (FAIR)
data principles. We call these single result experiments, streamlined with no narrative
“micropublications”, ideal for among other things, results that often go unpublished. Authors will
preserve provenance and establish credit for their research and the automated flow of data they
submit will be made publicly accessible in established and authoritative data repositories such
as the Model Organism Database (MOD) members of the Allied Genome Resources (AGR):
FlyBase, Mouse Genome Database (MGI), Rat Genome Database (RGD), Saccharomyces
Genome Database (SGD), WormBase, Zebrafish Model Organism Database (ZFIN), for further
re-use. Through the aforementioned repositories, all submitted metadata will automatically be
integrated with existing datasets that have been manually extracted from the literature for
almost 2 decades. These data will be peer reviewed ensuring they are of high quality and that
they meet community standards. Micropublications will be citable, discoverable, and will comply
with the Minimum Information Standards for scientific data reporting. In addition, researchers
will be able to share both positive and negative data with the scientific community, fulfilling
funding agencies' requirements to share all data coming from publicly funded research. After
establishing this data retrieval/publication pipeline with WormBase first, and AGR member
databases, we will work to expand to non-member, but otherwise critical biomedical model
organism databases, such as Xenbase (Xenopus laevis and tropicalis Database), DictyBase
(Dictyostelium discoideum database), PomBase (Schizosaccharomyces pombe Database),
among others.