SUMMARY/ABSTRACT
A key deliverable of the 4DN consortium consists in the sharing of data that pertains to the integration of chromatin
conformation capture (3C) information with the 3-dimensional physical mapping in nuclear space of the corresponding
sequences by multiplexed FISH imaging experiments. To facilitate this integration, the 4DN Imaging Working Group
(IWG) and the Data Coordination and Integration Center (DCIC)1 have developed a data format and metadata annotation
specifications that allows standardized data exchange for multiplexed FISH/FISH Omics datasets2,3. This community
standard is called 4DN FISH Omics Format (FOF) for Ball-and-Stick Chromatin Tracing (FOF basCT)3 and allows the
exchange of processed results deriving from various imaging techniques for basCT4,5,6,7,8–10,11. The release of FOF basCT
in early 20213 has resulted in the deposition to the 4DN Data Portal of over 70 multiplexed FISH datasets2, adoption
of the format by public repositories12, 13–16, and the development of downstream processing and visualization
pipelines17–19.
Despite these advances, the production of formatted datasets and their upload to the 4DN Data Portal imposes a significant
curation and management burden, negatively impacting productivity. With this supplement, we propose to address this
challenge and significantly improve the efficiency of data exchange. This will be achieved by developing automated FISH
Omics data deposition and sharing pipelines that scientists can use both in 4DN and outside 4DN.
1. We will develop a data format and standardized annotation specifications to describe the results of volumetric Chromatin
Tracing (vCT) FISH Omics experiments. These specifications will be built specifically to extend the already available
FOF-basCT, be applicable to technologies such as OligoSTORM, and Oligo DNA-based Point Accumulation for
Imaging in Nanoscale Topography (OligoDNA-PAINT)20–23, and it will incorporate specifications for the description of
Oligopaint probes for genomic loci imaging to be compatible with existing genome browsers.
2. We will develop pipelines to ensure that laboratories within 4DN can produce and successfully transfer both FOF-bas
and vCT (FOF-CT) datasets to the DCIC for incorporation into the 4DN Data Portal. These pipelines will include
software tools to minimize the burden of manual curation and expedite data ingestion, Quality Control (QC) procedures
to promote the upload of high-quality datasets, and automated rating of datasets (i.e., minimal, recommended, ideal)
depending on standardized criteria such as the presence of quality metrics or common coordinated framework (CCF)
nuclear mapping information.
3. We will develop prototype pipelines to establish the feasibility of sharing metadata-rich raw image data associated with
FOF-CT results deposited to the 4DN Data Portal2 via general public repositories such as the OME-IDR12, or the BioImage
Archive 24,25, 2.
The overall impact of the described deliverables will be to generate integrated public datasets to be used to benchmark the
development of tools for different aspects of the multiplexed FISH processing (i.e., drift and chromatic correction, single
particle detection and localization, spot fitting, filtering and calling, and feature segmentation) and post-processing
computational (i.e., machine learning classification and integrated modeling) pipelines.