Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws s3 sync does not ignore storage-class of glacier by default #748

Closed
e0d opened this issue Apr 8, 2014 · 17 comments
Closed

aws s3 sync does not ignore storage-class of glacier by default #748

e0d opened this issue Apr 8, 2014 · 17 comments
Labels
bug This issue is a bug. investigating This issue is being investigated and/or work is in progress to resolve the issue. s3

Comments

@e0d
Copy link

e0d commented Apr 8, 2014

With:

awscli==1.3.6
botocore==0.40.0

While the default param for aws sync for storage-class is documented as STANDARD, files with storage-class of GLACIER are matched when the file list is generated, but throw errors when download is attempted.

A client error (InvalidObjectState) occurred when calling the GetObject operation: The operation is not valid for the object's storage class
@jamesls jamesls added the s3 label Jul 16, 2014
@jamesls
Copy link
Member

jamesls commented Jul 28, 2014

Yeah it seems like we should be handling the glacier storage class better. There's a few interesting edge cases we'll need to consider:

  • The basic case of copying s3->local, we should just ignore everything with a storage class of glacier.
  • When copying to s3 (via cp, cp --recursive, but not sync), what happens if local file exists, and remote file exists but is storage class glacier?
  • When determining what files to sync (via aws s3 sync), what happens when we encounter a remote object with a storage class of glacier and a local file with a newer last modified time? Should we warn and give a non-zero RC (and keep going), or just ignore the file.

@kyleknap What are your thoughts?

@kyleknap
Copy link
Contributor

kyleknap commented Aug 1, 2014

I tested some of these edge cases by creating some glacier objects of storage in a s3 bucket. Here is what I have found out and my thoughts:

  1. We should ignore the file if trying to download an glacier object since it is not possible to immediately download the file. And possibly throw a warning letting the user know that the file is being ignored.

  2. When copying to s3 using cp, cp --recursive, the local file exists, and the remote file is a glacier object, the local file overwrites the glacier object and the newly uploaded object becomes a standard object.

  3. When using sync on a local directory to a remote bucket, if the local file is newer than the glacier object, the local file currently overwrites the glacier object and becomes a standard object. I say we keep this as is to avoid change from existing behavior, but we can add a --ignore-glacier argument that ignores glacier objects during the sync.

@robbintt
Copy link

robbintt commented Aug 5, 2014

Thanks for testing this and putting the results here.

With regards to case #3, local->S3, I believe glacier files also are impacted by whether versioning is enabled.

@benishs
Copy link

benishs commented Aug 8, 2014

This is probably not the best place to ask this (sorry), but I'm not sure where else to ask…

I'm trying to do a sync from 1 bucket to another (moving things to a different region). The bucket has both regular S3 stuff and glacier stuff. The S3 stuff seems to have synced as expected, but for all the Glacier stuff I'm getting the same error mentioned by @e0d at the start of this thread ("A client error (InvalidObjectState) occurred when calling the GetObject operation…").

Is the solution as simple as re-running the sync command with the --storage-class REDUCED_REDUNDANCY flag? Or would that try to move all my regular S3 stuff to Glacier?

Or does the sync command not really work with the Glacier class storage at the moment?

Apologies for what are probably ignorant questions – I'm a CLI newbie and this is the only page that turned up in a search that seemed remotely relevant.

@kyleknap
Copy link
Contributor

kyleknap commented Aug 8, 2014

No worries. Currently, the sync command does not currently work with Glacier class storage objects. You are getting the error because Glacier objects stored in s3 cannot be downloaded/copied without restoring them to standard objects in s3. So no matter what to transfer the glacier objects you will need to restore the glacier objects to standard objects. Currently, the s3 commands do not have a feature for restoring glacier objects. One of the future goals is to better handle Glacier objects by being able to throw a flag to ignore them, throw warning about ignoring them, and/or being able to restore them to standard objects.

@Julian
Copy link

Julian commented Oct 19, 2014

+1 -- same issue with s3 mv --recursive.

@mvandiest
Copy link

Guys, this is a pretty big issue when using S3/Glacier as a backup/restore solution. Any info on a patch?

@gideononline
Copy link

This (InvalidObjectState) error also causes the sync process to be extremely slow. We have millions of files in an S3 bucket and millions more in glacier. So although the sync process successfully completes in the end, it takes many hours more than it should because of millions of exceptions. Any ETA on the fix?

@joehoyle
Copy link

joehoyle commented Feb 2, 2015

+1 with this issue, I can't perform a sync while using Glacier

@caedmonjudd
Copy link

+1 on this issue as well.

@viyh
Copy link

viyh commented Apr 20, 2015

+1, also being able to exclude or include Glacier object types for "ls" would be useful.

@thomascate
Copy link

+1, has anyone figured out a work around for this?

@cake-icing
Copy link

+1

3 similar comments
@andacata
Copy link

+1

@huevos-y-bacon
Copy link

+1

@Argoday
Copy link

Argoday commented Sep 10, 2015

+1

@robbintt
Copy link

it would be so nice if the API provided an MD5 or any equivalent for files in the glacier. Is there any sort of unique ID like an object ID or date + object ID that could be used as a unique ID in order to allow s3 sync to work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. investigating This issue is being investigated and/or work is in progress to resolve the issue. s3
Projects
None yet
Development

No branches or pull requests