Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download data from aws s3 #245

Closed
pocession opened this issue Aug 19, 2022 · 12 comments
Closed

Download data from aws s3 #245

pocession opened this issue Aug 19, 2022 · 12 comments

Comments

@pocession
Copy link

Dear people in Tabula muris,

Thank you for your contribution in generating this beautiful data set.

I am trying to download data with the following command:
aws s3 cp s3://czbiohub-tabula-muris/TM_droplet_mat.h5ad .

However, I encounter the following error:
warning: Skipping file s3://czbiohub-tabula-muris/TM_droplet_mat.h5ad. Object is of storage class GLACIER. Unable to perform download operations on GLACIER objects. You must restore the object to be able to perform the operation. See aws s3 download help for additional parameter options to ignore or force these transfers.

I then tried to download data with the following command (plus --storage-class STANDARD --force-glacier-transfer):
aws s3 cp s3://czbiohub-tabula-muris/TM_droplet_mat.h5ad . --storage-class STANDARD --force-glacier-transfer

But I still have this error:
download failed: s3://czbiohub-tabula-muris/TM_droplet_mat.h5ad to ./TM_droplet_mat.h5ad An error occurred (InvalidObjectState) when calling the GetObject operation: The operation is not valid for the object's storage class

Could you provide some command lines for downloading data from you S3 buckets?

Best,

Tsunghan Hsieh (post doc in Riken)

@aopisco
Copy link
Contributor

aopisco commented Aug 21, 2022

Hi @pocession, the data is located here -- if you update your paths that should work!

@pocession
Copy link
Author

Thanks. It works now.

@warrenalphonso
Copy link

Hey, sorry if this is a silly question but isn't the link you posted to tabula-muris-senis and not tabula-muris data? I wanted to check out the data for the original tabula muris paper, but I'm getting this and some other S3 errors while attempting to download it. For example,

$ aws s3 cp s3://czb-tabula-muris/TM_droplet_metadata.csv .
fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden

aws s3 ls works though. I'm not so familiar with S3, but it looks like some files have been compressed via Glacier and others we just don't have access to.

@pocession
Copy link
Author

Hi, you are having same issues as I had before. To get the files by using AWS CLI, you have to apply for an account from Amazon S3 and get the keys. Follow these steps:

  1. Apply for Amazon S3 account (You need a credit card). You can follow Amazon S3's manual.
  2. Get the Amazon S3's access key ID and secret access key. You can follow AWS Account and Access Keys
  3. Install Amazon S3 command line interface.
  4. In your terminal, type aws configure.
  5. Type your Amazon S3's access key ID and secret access key.
  6. Go to Tabula muris S3 storage.
  7. Copy the S3 URL of the data you want.
  8. In your terminal, type aws s3 cp S3-URL your-destination-path

It seems more than three people (plus you and me) confused the downloading URL and the usage of Amazon S3. You may consider to open a new issue so that Tabula muris people may notice this.

Best

@warrenalphonso
Copy link

Thanks for the reply @pocession! You linked to tabula-muris-senis, which works for me. But tabula-muris doesn't work. I had configured the CLI already, like you mentioned. And the link I posted above is the S3 URI. For example, if you click on TM_droplet_metadata.csv here and then copy S3 URI, you'd get the same URI I posted above:

$ aws s3 cp s3://czb-tabula-muris/TM_droplet_metadata.csv .
fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden

Could you please try this specific file and let me know if it works for you?

@warrenalphonso
Copy link

Maybe @aopisco, can you help?

@pocession
Copy link
Author

@warrenalphonso No, I also could not download from czb-tabula-muris. But I guess the czb-tabula-muris-senis is the official repository?

@warrenalphonso
Copy link

Thanks for trying! I thought tabula-muris and tabula-muris-senis were two different datasets, since they're released in two separate papers around two years apart.

@pocession
Copy link
Author

pocession commented Aug 30, 2022 via email

@aopisco
Copy link
Contributor

aopisco commented Aug 31, 2022

Hi all, Tabula Muris Senis contains Tabula Muris. Both datasets are single cell. Tabula Muris is the 3 month timepoint. Tabula Muris Senis is 1 month, 3 months (Tabula Muris), 18 months, 21 months, 24 months, and 30 months. The reason why I sent the instructions for Tabula Muris Senis is because that dataset is more accessible and it does contain the Tabula Muris. I hope this helps.

@pocession
Copy link
Author

pocession commented Aug 31, 2022 via email

@warrenalphonso
Copy link

Thanks for clarifying!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants