• What is AnVIL?
  • Why AnVIL?
  • NIH Data Management and Sharing Policy Requirements
  • Platform and Data Security
  • Supported by NHGRI
  • Publications
  • Citing AnVIL

Why use AnVIL?

The NHGRI AnVIL (Genomic Data Science Analysis, Visualization, and Informatics Lab-space) is a project powered by Terra for biomedical researchers to access data, run analysis tools, and collaborate. Both biology researchers and educators can benefit from using AnVIL (anvil.terra.bio) for their research and in the classroom.

This guide acts as a resource answering the question "why use AnVIL?". It will discuss the research, classroom, and general benefits of using AnVIL and point to related resources throughout.

Why use AnVIL? There are research, classroom, and general benefits presented as sections within this guide

Benefits of using AnVIL for research

The research benefits for using the AnVIL include ease of platform access, variety of analysis solutions, data options, data and analysis in the same place, scalability, renting needed resources, role-based permissions, shareable workspaces, and being a repository compliant with DMS policy

Ease of platform access

The primary means of accessing the AnVIL platform (anvil.terra.bio) is through a web browser - users do not need to download data or install software.

Variety of analysis solutions

Compute options available on the AnVIL include Jupyter notebooks, workflow description languages, RStudio, and Galaxy

AnVIL supports an assortment of frameworks and tools. Researchers can use their favorite tool to work with data interactively or through non-interactive batch processing. Due to this variety and interoperability with other platforms, researchers can stay within a single environment for their analysis without having to shift between platforms.

Data: yours or cloud-hosted open & controlled access

Data options available through AnVIL include several consortia datasets including GTEx

AnVIL securely stores diverse, open and controlled access, cloud-hosted datasets with a browsable summary catalog so researchers can identify relevant datasets they may need to request access to.

For more information on the AnVIL project portal: Finding and accessing data

Data & analysis in same place

The AnVIL portal is an entry point for all parts of the AnVIL with documentation and announcements. The home page of the portal is shown

AnVIL is a unified computing environment for data storage, management, and analysis. The AnVIL portal serves as an entry point to access all parts of the AnVIL system as well as training materials and announcements.

Scalability

AnVIL is conducive to analysis at massive scale and for data exploration and training. Researchers get access to dedicated compute resources, avoiding queue time and lack of access at some institutions. Researchers can also launch light environments or run test analyses without incurring much cost or spending a lot of time to configure.

Rent needed resources

Since AnVIL handles the support and maintenance of the platform (including the hardware and software), you can focus on performing your work on AnVIL rather than setting up and maintaining the platform, freeing up effort for your science. This is immensely valuable for researchers who do not have deep institutional IT and system administrator support for research infrastructure.

Role-based permissions

A workspace is the fundamental unit of activity in AnVIL with role-based access. An example workspace is shown with the access level or role highlighted

Group management can be utilized to control who can access specific data, analysis workspaces, and your billing resources. Workspaces provide a collaborative environment with role-based permissions. These permission include reading, writing, or owning with additional permissions for running compute and sharing. Especially within the contexts of working with sensitive data or large amounts of data, AnVIL's role-based group management permission structure is instrumental.

For more information on the AnVIL project portal: Terra Workspaces

Shareable workspaces

A workspace can be a shareable record of analyses. An example workspace is shown with options to share the workspace through its permalink highlighted

Workspaces can contain data, metadata, and analysis tools, as well as documentation and history of workflow runs, additionally displaying important information such as when the workspace was created and last modified. AnVIL workspaces on the web can serve as shareable, reproducible records of analyses. Research conducted on the AnVIL platform has contributed to over 115 scientific publications citing the AnVIL paper, demonstrating its role in advancing genomic and biomedical research.

Repository compliant with DMS Policy

The AnVIL serves as a cloud data repository compliant with the Data Management and Sharing (DMS) Policy. Data access controls can be specified to limit data access and use.

By submitting their data to AnVIL, not only can researchers meet the requirements of DMS Policy, they can also contribute to the expanding network of NIH funded data housed in the AnVIL, furthering scientific discovery.

For more information on the AnVIL project portal: NIH's Data Management and Sharing policy requirements

Benefits of using AnVIL in the classroom

The classroom benefits of using the AnVIL are outlined including that it provides an unified computing system, authentic experience with cloud computing, a varity of tools, relevant datasets, and has prepared exercises

AnVIL provides all the advantages of a cloud computing environment, such as version control and offering a unified computing system without providing physical computers with certain specifications. Additionally, AnVIL provides students with authentic experience working in the cloud -- which is becoming common in today's research environment. Students can also gain experience with a variety of tools (e.g., Galaxy, RStudio, Jupyter notebooks, WDL workflows) all in one place while working with relevant datasets and prepared exercises.

Four example books and mini Cures are shown. These include a demo on single cell with bioconductor, a book on SARS with Galaxy on AnVIL, the BioDIGS R Data Package, and the RNA-Seq mini cure

Overall benefits of AnVIL

The general benefits of using the AnVIL are shown. These include the ability to control costs, work with protected data safely, maintenance is handled, training and support is available, and there is a collaborative community

Ability to control costs

Cloud computing is not free and estimating costs may seem daunting to those considering use of the AnVIL. However, Terra provides thorough, transparent documentation explaining data storage and cloud computing costs and has been working to improve transparency and management of costs for AnVIL users through cost reporting, cost controls and estimates, and cost optimizations. Additionally, in order to debug or benchmark your work, analyses or workflows can be tested with smaller scale test datasets or light environments without incurring much cost or spending a lot of time to configure environments.

For more information on the AnVIL project portal: Controlling cloud costs

Work with protected data safely

Due to AnVIL maintaining compliance with FedRAMP policies, clinical data containing PHI and PII can be safely and securely stored and analyzed on AnVIL. This includes the ability to export data from clinical data collection and management tools like REDCap and import it into AnVIL Terra Tables.

For more information on the AnVIL project portal: Platform security and NIH's Data Management and Sharing policy requirements

Maintenance is handled

Since AnVIL handles the support and maintenance of the platform (including the hardware and software), you can focus on performing your work on AnVIL rather than setting up and maintaining the platform, freeing up effort for your science. This is immensely valuable for researchers who do not have deep institutional IT and system administrator support for research infrastructure.

Training and support is available

To equip researchers and students to work on the AnVIL, the AnVIL team

Many AnVIL support options or books can be found in the AnVIL Collection. One such option is the AnVIL Getting Started book. Other options include the support forum and AnVIL Demos which are held monthly live, but also recorded and posted to YouTube

For more information on the AnVIL project portal: Getting Started Instructions and Videos & Tutorials

Collaborative community

Community examples and resources such as the getting started book, the portal, and AnVIL on Terra

AnVIL has begun hosting community conferences to collaboratively innovate during CoFests! and to discuss research performed with the platform. The community can work directly with the AnVIL team to understand current development, feature requests, and a roadmap or future directions for the platform.

Additionally, AnVIL values and routinely solicits user feedback to improve the user experience and provide the most beneficial features and enhancement for biomedical research. Feedback is gathered:

For more information on the AnVIL project portal: Events and News

Conclusion

All of this together describes how the AnVIL provides secure, cost-effective genomic analysis at scale, and is a useful cloud-based platform for training and research.

Advantages of cloud-based training and research with the AnVIL are summarized. Cloud advantages include identical learner or analyst environments, pre-installed bioinformatic tools, as well as community support and training material. AnVIL advantages include bridge training to research on human genomic datasets, secure platform for controlled-access data, ability to install software with conda or Docker, scalability with greater resources or quicker runtimes, and collaboration tools


Help us make these docs great!
All AnVIL docs are open source. See something that’s wrong or unclear? Submit a pull request.
Make a contribution
NHGRINIHHHSUSA.GOV
HelpPrivacy
Unversioned-b52741e