NHGRI AnVIL Open-Access Data Available in AWS Open Data Registry
We are pleased to announce that open-access data hosted by NHGRI's AnVIL Project is now available through both the Registry of Open Data on AWS and the AWS Marketplace and is freely downloadable from the AnVIL Data Explorer.
This integration provides researchers with flexible access to AnVIL datasets directly within the AWS compute environment, as well as egress-free downloads for use in other compute environments such as institutional HPC clusters. These datasets are also available in Google Cloud Platform (us-central1) as part of AnVIL Terra.
Available Datasets
All open-access AnVIL datasets are currently available through this program:
- AnVIL 1000G PRIMED Data Model
- AnVIL 1000G High Coverage 2019
- AnVIL GTEx Public Data
- AnVIL HPRC
- AnVIL NIA CARD Coriell Cell Lines Open
- AnVIL T2T
- AnVIL T2T ChrY
- AnVIL ENCORE 293T
- AnVIL ENCORE RS293
- AnVIL IGVF Mouse R1
- AnVIL MAGE
Additional datasets and updates to existing datasets will be made available as they are released by NHGRI's AnVIL Project.
How to Access the Data
Data can be accessed through the AnVIL Data Explorer using any of the following methods:
- Dataset Download via curl — Download a complete dataset using the
curlcommand or usecurlto download files of selected types across multiple datasets simultaneously. - TSV File Manifest — Export a manifest containing the S3 URI for each file. A new column,
files.azul_mirror_uri, provides the S3 URI for each file in the S3 bucket. Note that the filename in this column is a hash; the human-readable filename can be found in thefiles.file_namecolumn. - Individual File Download — In the Explorer's Files tab, use the download icon to select individual files. A dark blue icon indicates the file is available for direct download; a light blue or grayed-out icon indicates the file is managed-access and currently not available to download via the AWS Open Data Registry.
Managed Access Data
Managed-access files remain available for download from Google Cloud Platform via Terra's requester-pays services.