Five years ago the AnVIL was founded with a vision of creating a federated data ecosystem. Its first phase
focused on building the foundational capabilities needed to bring together data, tools, and research
communities in a cloud-based environment. Now, in this second phase, the focus must be on scientific impact.
We will pursue the following Aims that emphasize growing the AnVIL data corpus, going multi-cloud, creating
analytical tools for flagship NHGRI initiatives, and increasing the user base. We will accomplish this through
the following Aims:
¿ Aim1 (Data Ingestion): Support the ingestion, curation, and management of diverse datasets, so
that they are accessible to the research community. In Phase I of the AnVIL, we ingested,
wrangled, and QC’d more than 5PB of data from NHGRI consortia. In Phase II, we will continue this
track record of success in supporting consortia, and extend our efforts to support the long tail of
individual researchers with valuable data to contribute to the AnVIL.
¿ Aim2 (Software Infrastructure): Reducing barriers to entry by supporting multiple clouds and
improving cost control. While Phase I of the AnVIL focused on establishing foundational software
infrastructure, Phase II must be about scaling adoption of the AnVIL. We have a three-part strategy for
achieving this: (i) Becoming multi-cloud, so that we support Microsoft Azure, in additional to Google
Cloud; (ii) Creating “AnVIL lite,” a simplified and free tier of the AnVIL that lowers barriers to entry; (iii)
Exposing tools to improve billing visibility and prevent overspend.
¿ Aim3 (Scientific Services): Leverage the AnVIL’s datasets and platforms to accelerate scientific
research. In Phase II, we must prioritize the scientific impact of the AnVIL. Towards this end, we will
leverage: (i) an imputation service drawing on AnVIL datasets and other datasets of diverse ancestry;
(ii) a newly developed genomic variant store to support joint calling; (iii) an improved and expanded
capability for third party deployment of tools and applications in the AnVIL.
¿ Aim4 (User Services): Support the growth and long-term success of the research community
through user support, training, and project management. The services that comprise the AnVIL are
not only web services, but also human services. Meeting the needs of researchers everywhere requires
security, user support, training, and project governance.
The guiding principle of our efforts is that progress in genomic data science will happen most rapidly if there is
a diversity of interoperable solutions created by a plurality of groups. Toward that end, we will continue to
ensure that the AnVIL continues to drive towards interoperability and federation by participating in NIH-led and
international efforts focused on standard setting and data sharing.