Pathogen Data Network - Summary
The Pathogen Data Network (PDN) will nucleate global networks around the mobilization of diverse biodata
types, including host and pathogen genomics, transcriptomics, proteins, pathways and networks, imaging and
cohorts. The integration and linkage of these data for the purposes of research and public health response will
be made possible through the development of a technical framework including specifications, standards,
interfaces and reference software implementations. All the publicly available data to be integrated with further
datasets and tools from a range of pathogen-related data resources will be made accessible and re-usable
under the Pathogens Portal (PP), hosted at EMBL-EBI, and within distributed local Pathogen Portals (LPP),
hosted globally and enabling additional functionalities to be customized for the specific local context. To foster
data mobilization, PDN will establish a network of FAIR Local Data Hubs (LDH) based on reference software
implementations to be deployed on locally controlled cloud infrastructure, ideally in countries having signed the
Nagoya Protocol. To achieve trustworthy and equitable sharing of the mobilized data to international
repositories, legally compliant with local regulations as well as aligned to international policy, we will also
address in-depth issues of trust and ownership and work to provide data-sharing policy options for international
and national organizations, ensuring alignment of the PDN project with international policy and its wide adoption
by end users. This will include LDH focusing on Low and Middle Income Countries. In parallel, we will establish
a LDH Managers' community of practice to exchange knowledge and expertise, with a focus on capacity
building, harmonization and preparedness. Data Hubs (DH) hosted centrally will continue existing in parallel,
supporting those who lack capacity, resource or expertise to deploy and maintain a LDH. LDH would be able
to host personal sensitive data that may not be openly shared but can be made discoverable and accessible
for research and public health under proper legal and ethical frameworks. The Pathogen Analysis System (PAS)
will be extended with additional workflows from selected use cases spanning global sewage data, foodborne
viruses and linked clinical-epidemiological data. PDN will enable its users to run analyses centrally for DH, and
locally or remotely for LDHs. An extensive and global support, outreach and training program will fuel the
adoption of the network concept, the connection of infrastructure to its interfaces, and the use of open pathogen
data by the global scientific community - both for research and for decision-making. In summary, PDN will
develop the networks and all the underpinning components at the technical-, analytical-, trust-, policy-, training-
and outreach-levels to truly enable data mobilization and sharing, and the subsequent data integration and
linking into a global knowledgebase network serving research needs thanks to open access to data, and public
health needs, including preparedness and response, thanks to timely access to controlled-access data and a
dedicated surveillance and outbreak dashboard.