PROJECT SUMMARY - Evaluation and optimization of NWB software and data in the cloud.
This proposal is for an administrative supplement for the U24 grant “Advancing standardization of
neurophysiology data through dissemination of NWB.” The parent project centers around providing support for
the usage of Neurodata Without Borders (NWB), a data standard for neurophysiology data that allows
neuroscience researchers to package and publish their data in a form that is readily available and reusable by
others. Through the parent project, the team is taking several approaches to engage with the user community
and lower the barriers to entry for adopting NWB, including hosted hackathons, one-on-one consultations, and
tutorials. The team also ensures the continued quality of the NWB codebase through bug tracking, test
coverage, and continued engagement with scientific software developers to assist with the integration of new
tools for analysis, visualization, search, and publication of NWB datasets.
Through our engagements with the community, we have identified the need to optimize NWB software and
data for usage in the cloud as a key obstacle to adoption of NWB that we anticipate will become significant in
the coming years. As neurophysiology data volumes continue to grow at a rapid pace, researchers are
increasingly seeking to leverage the parallel processing capabilities of cloud infrastructure for converting data
to NWB and analysis of NWB data. However, the current NWB conversion tools are not yet equipped for cloud
integration and the NWB data layout is not optimized for cloud-based reading and analysis. To address these
gaps, we will focus on two key aims. First, we will evaluate and optimize strategies for using cloud resources to
enable researchers to perform efficient, cost-effective cloud-based conversion of data to NWB. Specifically, we
will package our NeuroConv conversion software into containers that contain all of the necessary elements for
NeuroConv to be run on any cloud computing environment, and we will develop tools for integrating existing
cloud resources, e.g., for input and output of conversion data from/to cloud storage. Second, we will evaluate
and optimize reading of NWB data from cloud storage to enhance cloud-based analysis. Specifically, we will
integrate the Kerchunk software package designed to read data efficiently from the cloud with the PyNWB
software for reading NWB data and we will evaluate the performance of different data layout strategies and
optimize the storage of NWB data to enhance the efficiency for cloud-based access and analysis.
Successful completion of the proposed work will create the necessary infrastructure and guidance for
neuroscience researchers to take full advantage of cloud computing for conversion of data to NWB and
analysis of data in NWB. This will enable researchers to convert and analyze their neurophysiology data faster
and with fewer resources, which promises to improve data sharing and expedite scientific discovery.