A Robust, Secure Framework to Effortlessly Bind Distributed Databases and Analysis Tools into Tightly Integrated Translational Drug Discovery Computational Platforms - PROJECT SUMMARY
Collaborative Drug Discovery, Inc. (CDD) proposes to develop Cloud Workspaces for Drug Discovery – a
novel informatics framework that will enable scientists engaged in drug discovery and translation to effort-
lessly, robustly, and securely integrate disparate databases and computational tools distributed across multiple
systems and vendors into highly-efficient, custom-tailored computational workflows. Our innovative technol-
ogy will solve a critical problem that hinders drug discovery and translation efforts: scientists in this field typi-
cally need to combine chemical and biological data from several sources, run them through multiple software
packages that specialize in different types of analyses and visualization, then ideally store the results of the
analyses together with their underlying experimental data. Today, this type of integration is difficult and
expensive to accomplish and typically fragile, creating a large barrier to (i) exploiting the rapidly increasing
number of high-quality public-access data repositories and (ii) evaluating promising new analytical tools and
strategies. Monolithic platforms offer to solve this problem by bringing everything together under one roof, but
they are extremely expensive and they limit flexibility: no single platform can offer every capability. The alter-
native approach – stringing together discrete resources – evolved during the era of desktop computing and
does not translate well to modern cloud-based workflows and in particular to the challenges of performing
computationally intensive operations that require combining large datasets distributed across remote systems.
Cloud Workspaces (CW) aims to combine the strengths and avoid the weaknesses of these two extremes. CW
will in essence allow users to easily create their own individualized cloud-hosted solutions tailored to their
unique requirements and workflows. Our approach offers the performance, robustness, and ease of use of a
monolithic software solution, but without the associated inflexibility and vendor lock in. It offers the flexibility
and openness of combining discrete resources, but without the associated integration challenges and fragility,
and it advances the pipelining approach to embrace cloud-based models and to encompass distributed data
resources without compromising performance or security. In Phase 1 we proved that we could robustly and
efficiently synchronize biological and chemical data (transferring only new or modified data while retaining
correct association of chemical identifiers) between the CW container environment and remote databases,
which was a challenging but essential prerequisite for our concept. In Phase 2 we will complete development of
CW and demonstrate its effectiveness with multiple real-world applications together with software application
partners and beta customer end users. The market for the technology ranges from academics to small and
medium size companies to the large pharmaceutical firms.