aws s3 sync does not ignore storage-class of glacier by default #748

e0d · 2014-04-08T14:00:33Z

With:

awscli==1.3.6
botocore==0.40.0

While the default param for aws sync for storage-class is documented as STANDARD, files with storage-class of GLACIER are matched when the file list is generated, but throw errors when download is attempted.

A client error (InvalidObjectState) occurred when calling the GetObject operation: The operation is not valid for the object's storage class

The text was updated successfully, but these errors were encountered:

jamesls · 2014-07-28T19:35:26Z

Yeah it seems like we should be handling the glacier storage class better. There's a few interesting edge cases we'll need to consider:

The basic case of copying s3->local, we should just ignore everything with a storage class of glacier.
When copying to s3 (via cp, cp --recursive, but not sync), what happens if local file exists, and remote file exists but is storage class glacier?
When determining what files to sync (via aws s3 sync), what happens when we encounter a remote object with a storage class of glacier and a local file with a newer last modified time? Should we warn and give a non-zero RC (and keep going), or just ignore the file.

@kyleknap What are your thoughts?

kyleknap · 2014-08-01T18:48:03Z

I tested some of these edge cases by creating some glacier objects of storage in a s3 bucket. Here is what I have found out and my thoughts:

We should ignore the file if trying to download an glacier object since it is not possible to immediately download the file. And possibly throw a warning letting the user know that the file is being ignored.
When copying to s3 using cp, cp --recursive, the local file exists, and the remote file is a glacier object, the local file overwrites the glacier object and the newly uploaded object becomes a standard object.
When using sync on a local directory to a remote bucket, if the local file is newer than the glacier object, the local file currently overwrites the glacier object and becomes a standard object. I say we keep this as is to avoid change from existing behavior, but we can add a --ignore-glacier argument that ignores glacier objects during the sync.

robbintt · 2014-08-05T08:27:49Z

Thanks for testing this and putting the results here.

With regards to case #3, local->S3, I believe glacier files also are impacted by whether versioning is enabled.

benishs · 2014-08-08T01:26:43Z

This is probably not the best place to ask this (sorry), but I'm not sure where else to ask…

I'm trying to do a sync from 1 bucket to another (moving things to a different region). The bucket has both regular S3 stuff and glacier stuff. The S3 stuff seems to have synced as expected, but for all the Glacier stuff I'm getting the same error mentioned by @e0d at the start of this thread ("A client error (InvalidObjectState) occurred when calling the GetObject operation…").

Is the solution as simple as re-running the sync command with the --storage-class REDUCED_REDUNDANCY flag? Or would that try to move all my regular S3 stuff to Glacier?

Or does the sync command not really work with the Glacier class storage at the moment?

Apologies for what are probably ignorant questions – I'm a CLI newbie and this is the only page that turned up in a search that seemed remotely relevant.

kyleknap · 2014-08-08T16:33:13Z

No worries. Currently, the sync command does not currently work with Glacier class storage objects. You are getting the error because Glacier objects stored in s3 cannot be downloaded/copied without restoring them to standard objects in s3. So no matter what to transfer the glacier objects you will need to restore the glacier objects to standard objects. Currently, the s3 commands do not have a feature for restoring glacier objects. One of the future goals is to better handle Glacier objects by being able to throw a flag to ignore them, throw warning about ignoring them, and/or being able to restore them to standard objects.

Julian · 2014-10-19T21:02:34Z

+1 -- same issue with s3 mv --recursive.

mvandiest · 2014-12-29T18:45:36Z

Guys, this is a pretty big issue when using S3/Glacier as a backup/restore solution. Any info on a patch?

gideononline · 2015-01-17T00:58:10Z

This (InvalidObjectState) error also causes the sync process to be extremely slow. We have millions of files in an S3 bucket and millions more in glacier. So although the sync process successfully completes in the end, it takes many hours more than it should because of millions of exceptions. Any ETA on the fix?

joehoyle · 2015-02-02T11:28:53Z

+1 with this issue, I can't perform a sync while using Glacier

caedmonjudd · 2015-04-10T18:13:44Z

+1 on this issue as well.

viyh · 2015-04-20T18:36:17Z

+1, also being able to exclude or include Glacier object types for "ls" would be useful.

thomascate · 2015-06-01T16:58:44Z

+1, has anyone figured out a work around for this?

cake-icing · 2015-07-30T22:29:22Z

+1

andacata · 2015-07-31T07:49:35Z

+1

huevos-y-bacon · 2015-08-18T08:13:49Z

+1

Argoday · 2015-09-10T20:08:15Z

+1

robbintt · 2015-09-10T20:30:05Z

it would be so nice if the API provided an MD5 or any equivalent for files in the glacier. Is there any sort of unique ID like an object ID or date + object ID that could be used as a unique ID in order to allow s3 sync to work?

jamesls added the s3 label Jul 16, 2014

jamesls added investigating labels Jul 28, 2014

kyleknap mentioned this issue Oct 21, 2015

Warn and skip on glacier objects that may fail for s3 commands #1581

Merged

kyleknap closed this as completed in #1581 Nov 4, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aws s3 sync does not ignore storage-class of glacier by default #748

aws s3 sync does not ignore storage-class of glacier by default #748

e0d commented Apr 8, 2014

jamesls commented Jul 28, 2014

kyleknap commented Aug 1, 2014

robbintt commented Aug 5, 2014

benishs commented Aug 8, 2014

kyleknap commented Aug 8, 2014

Julian commented Oct 19, 2014

mvandiest commented Dec 29, 2014

gideononline commented Jan 17, 2015

joehoyle commented Feb 2, 2015

caedmonjudd commented Apr 10, 2015

viyh commented Apr 20, 2015

thomascate commented Jun 1, 2015

cake-icing commented Jul 30, 2015

andacata commented Jul 31, 2015

huevos-y-bacon commented Aug 18, 2015

Argoday commented Sep 10, 2015

robbintt commented Sep 10, 2015

aws s3 sync does not ignore storage-class of glacier by default #748

aws s3 sync does not ignore storage-class of glacier by default #748

Comments

e0d commented Apr 8, 2014

jamesls commented Jul 28, 2014

kyleknap commented Aug 1, 2014

robbintt commented Aug 5, 2014

benishs commented Aug 8, 2014

kyleknap commented Aug 8, 2014

Julian commented Oct 19, 2014

mvandiest commented Dec 29, 2014

gideononline commented Jan 17, 2015

joehoyle commented Feb 2, 2015

caedmonjudd commented Apr 10, 2015

viyh commented Apr 20, 2015

thomascate commented Jun 1, 2015

cake-icing commented Jul 30, 2015

andacata commented Jul 31, 2015

huevos-y-bacon commented Aug 18, 2015

Argoday commented Sep 10, 2015

robbintt commented Sep 10, 2015