"Data Resource and Administrative Coordination Center for the Scalable and Systematic Neurobiology of Psychiatric and Neurodevelopmental Disorder Risk Genes Consortium" - PROJECT SUMMARY / ABSTRACT Our U24 parent grant funds our team to serve as the Data Resource and Administrative Coordination Center (DRACC) for NIMH’s Scalable and Systematic Neurobiology of Psychiatric and Neurodevelopmental Disorder Risk Genes (SSPsyGene) consortium. The consortium’s mandate is to functionally and mechanistically characterize the contribution of 250 genes linked to neurodevelopmental and psychiatric disorders. One of our goals has been to lead the consortium into a Cloud-based Data Biosphere. This Competing Revision expands Aim 4 of the parent grant: “Data Processing, Analysis, Integration and Management”, where we propose to create a FAIR-based Data Biosphere model and a data submission system that supports the data providers within SSPsyGene. The current four data producing centers (more will join in Fall 2024) are working on 2D and 3D cultures of human induced pluripotent stem cells that are differentiated into neural tissues (organoids), as well as on live animals. The data centers are contributing a large variety of data formats, including scRNAseq, spatial transcriptomics, morphometry, behavioral assays, and electrophysiology. We are seeking support to assemble electrophysiology analysis methods alongside genomics data analysis in a standardized cloud-based data analysis pipeline. In contrast with genomics, electrophysiology tracks dynamically changing activity over arbitrarily long time periods at little additional cost. As a result, raw electrophysiology datasets have become too large to analyze effectively on-site, dwarfing the genomics data. Our key innovation is a new generative AI-driven multistage data compression method for electrophysiology data that will dramatically reduce compute times and storage costs while at the same time enhancing data interpretation. We have delineated a comprehensive cloud-based pipeline built around this compression pipeline, incorporating Kubernetes clusters for computing, S3-compatible storage, and containerized workflows managed via Dockstore, all runnable in a secure environment. This architecture will enable the efficient processing of 30 TB of existing test data along with much larger datasets anticipated in the near future from SSPsyGene and additional colleagues. We will examine accuracy and speed at different compression levels, scalability, and cost-effectiveness compared to existing local computing solutions. Our project will create a more cost-effective, and integrated research ecosystem for electrophysiology data that will benefit not only neuroscience, but also cardiology, optometry, audiology, gastrology and other research areas that use electrophysiology data.