Neuroscientific data contain information from an incredible diversity of species, are generated by a plethora of
devices, and encapsulate the results of scientific thinking and decision making. Most of this generated data
remains confined within laboratories and is not accessible to the broader scientific community. The research
projects awarded under the Brain Initiative are generating a diverse collection of data that can transform and
accelerate the pace of discovery. These datasets are large--ranging in size from GBs to PBs-- and represent
diverse data types and assorted metadata. To integrate, rather than further isolate, these numerous efforts
there is a need to archive, preserve, share, and process data in a way that is meaningful to neuroscience
researchers. Any technological solution should reduce redundancy of storage and computation, allow
computing near data, and provide easy, but protected when appropriate, access to researchers or citizen
scientists. Given the scale of these initiatives and the range of sample sizes and data types, any solution should
also consider the broad range of individual technical expertise in the community and therefore allow easy
engagement with and ingestion into an archive, while supporting education and training of the scientists in
using these technologies. To solve these problems, we propose ¿DANDI: Distributed Archives for
Neurophysiology Data Integration.¿We leverage our team’s extensive experience in informatics, standards
development, software engineering, community building, and leverage a robust open-source software stack to
create this archive. The archive will lower barriers for neuroscientists by using the ¿Neurodata Without Borders
(NWB; ¿http://nwb.org¿) standard as a consistent data format, by providing interoperability with other
standards, and by providing robust tools and convenient Web interfaces to interact with the archive. DANDI
will: 1) ¿provide a cloud platform for versioned neurophysiology data storage for the purposes of
collaboration, archiving, and preservation. 2) ¿provide easy to use tools for neurophysiology data submission
and access in the archive; and 3) facilitate adoption of NWB via standardized applications for data ingestion,
visualization and processing. ¿We will work with local investigators, the broader neurophysiology community,
and with federal and other funders to determine how long and which pieces of data will be stored in DANDI.
The archive will also use state of the art data distribution technologies to increase redundancy and fault
tolerance, and allow distributed computing across cloud and local computing resources. Consequently the
effort will significantly reduce the barrier between laboratories and the cloud, fostering collaboration and data
exchange. Overall, we aim to leverage our collective expertise to create and support an NWB-based
neurophysiology archive that seamlessly integrates with and enhances current researcher workflows, lowers
barriers for scientific inquiry and collaboration, and preserves information for wide reuse.