Data Resource and Administrative Coordination Center for the Scalable and Systematic Neurobiology of Psychiatric and Neurodevelopmental Disorder Risk Genes Consortium - ABSTRACT
Our team proposes to lead the SSPsyGene consortium into the Data Biosphere. We will do this by adapting
data biosphere technology and management techniques we have already deployed for other NIH institutes,
NIH Common Fund, the NIH Office of the Director, the Chan Zuckerberg Initiative (CZI), and the California
Institute for Regenerative Medicine (CIRM), making SSPsyGene interoperable across multiple disease areas.
We also bring our expertise with neurological data through our involvement with BICCN, Psychiatric Cell Map
Initiative, CZI’s Pediatric Brain Map, NHGRI's Center for Live Cell Genomics/Biotechnology, and our close
relationship with PsychENCODE and the Allen Brain Institute. For SSPsyGene, we have 4 major tasks: (1) We
will assemble all the information necessary to empower the consortium to choose between 100 and 250 genes
to experimentally characterize (Aim 2). We have identified more than 20 different types of information to be
integrated for this purpose, many of which are already in the UCSC Genome Browser. We will apply multiple
ranking algorithms to this integrated information source to guide the SSPsyGene Consortium’s decision
process. (2) We will work to establish an ontology structure that is sufficiently expressive yet fully maintainable,
supporting FAIR data use by both researchers and machines (Aim 3). Our previous work with the UCSC
Genome Browser and our close relationships with ontology organizations will help us to bridge the gaps
between molecular, cellular, tissue/organoid, and model organism measurements, and to extend these
resources when needed. Inspired by our experience with the clinical ontologies in OMOP and FHIR, we
propose a novel service to allow researchers to query phenotype-phenotype associations in large clinical
cohorts, such as All of Us and HEDIS, the database of records from Medicare and Medicaid. (3) We will create
a state-of-the-art SSPsyGene Data Biosphere fully compatible with those we created for other NIH institutes
(Aim 4). Our emphasis will be on standardization of the data submission process with extensive quality
monitoring to ensure timely and effective data release. We will leverage our deep involvement with the Global
Alliance for Genomics and Health to ensure all data and metadata will meet FAIR standards. We have
experience with the complex data types that will be generated by the SSPsyGene consortium, including
-omics, imaging, electrophysiology and other data types. (4) We have served as trusted third party organizers
to many NIH consortia, developing a reputation for fairness and impartiality in data sharing and publication,
and expertise in coordinating, generating consensus, publishing results, and creating a resource with maximal
impact (Aim 5). Based on our strengths in biomedical data, metadata and ontologies, FAIR platforms, and
consortium leadership, we are confident that we will achieve all the goals of the SSPsyGene Consortium.