PROJECT SUMMARY
The storage, sharing, and analysis of individual-level genomic, environmental, and linked phenotypic and/or
health outcome data poses profound technical and logistical challenges for precision medicine research.
Accordingly, new cloud-based computing and storage platforms are being developed to support facile data
processing and cloud-based analysis commons. However, the implications for responsible data governance of
these mechanisms are currently unexamined. To promote responsible and trustworthy data governance (i.e.
decision-making about how biomedical data are stored, accessed, and used by researchers, as well as
communicated to research participants), we will conduct an in-depth qualitative analysis of the policies,
practices, and procedures associated with three emerging cloud-based precision medicine platforms: the
BioData Catalyst or BDCatalyst (NHLBI), the Analysis, Visualization, and Informatics Lab-space or AnVIL
(NHGRI), and the Research Hub (All of Us Research Program). The immediate goal of this exploratory
investigation will be to examine how the control and management of genomic and linked clinical data stored on
these platforms differs from earlier data sharing efforts. We will also explore what research stakeholders,
including platform developers; investigators (data contributors as well as data users); institutional officials; and
funders regard as the most relevant governance tradeoffs associated with the new approaches. In particular,
we will solicit views on mechanisms employed to protect participant data, ensure that research uses are
aligned with informed consent, and make well-validated results available to interested participants. Draft
recommendations based in these observations will be shared with the research community and form the basis
for subsequent research, which will introduce these new platforms and the research practices they enable to
diverse research participants for feedback and critical reflection. To achieve these research objectives, we will
pursue the following Aims: (1) Characterize current approaches to the storage, sharing, and analysis of
largescale genomic and linked data enabled by emerging cloud-based analysis platforms; (2) Explore
stakeholder views on current and proposed approaches to the governance of genomic cloud-based analysis
platforms; and (3) Propose and vet recommendations for the trustworthy governance of genomic and linked
environmental and phenotypic data in the context of cloud-based platforms. The proposed investigation will
generate novel, timely, and detailed information about the storage, access, and intended use of large-scale
genomic and linked clinical data held in emerging cloud-based data storage and analysis platforms. These
data will provide a robust basis from which to identify best practices for trustworthy data governance and
provide enhanced transparency about a key precision medicine research tool.