Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshots continuously failing with blob [pending-index-293] already exists, cannot overwrite #21462

Closed
alexbrasetvik opened this issue Nov 10, 2016 · 1 comment
Assignees
Labels
>bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs

Comments

@alexbrasetvik
Copy link
Contributor

alexbrasetvik commented Nov 10, 2016

Got a 5.0.0-cluster logging this and failing to take snapshots. It should probably overwrite or gracefully handle the fact that he file exists.

[2016-11-10T13:04:33,885][WARN ][org.elasticsearch.snapshots.SnapshotsService] [found-snapshots:scheduled-1478783056-instance-0000000006/BWJzkt8QS6WPFBT8D5Ox4w] failed to finalize snapshot
org.elasticsearch.repositories.RepositoryException: [found-snapshots] failed to update snapshot in repository
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.finalizeSnapshot(BlobStoreRepository.java:544) ~[elasticsearch-5.0.0.jar:5.0.0]
	at org.elasticsearch.snapshots.SnapshotsService$5.run(SnapshotsService.java:878) [elasticsearch-5.0.0.jar:5.0.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:444) [elasticsearch-5.0.0.jar:5.0.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_72]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_72]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_72]
Caused by: java.nio.file.FileAlreadyExistsException: blob [pending-index-293] already exists, cannot overwrite
	at org.elasticsearch.cloud.aws.blobstore.S3BlobContainer.writeBlob(S3BlobContainer.java:105) ~[?:?]
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.writeAtomic(BlobStoreRepository.java:872) ~[elasticsearch-5.0.0.jar:5.0.0]
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.writeIndexGen(BlobStoreRepository.java:782) ~[elasticsearch-5.0.0.jar:5.0.0]
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.finalizeSnapshot(BlobStoreRepository.java:540) ~[elasticsearch-5.0.0.jar:5.0.0]
	... 5 more
@abeyad
Copy link

abeyad commented Nov 10, 2016

@alexbrasetvik do you have additional logs you can share from the node? this looks like a genuine issue - we purposely prevented overwriting of blobs and reworked snapshots to use UUIDs so we don't have to worry about overwrites, so this is likely a bug related to that.

abeyad pushed a commit to abeyad/elasticsearch that referenced this issue Nov 10, 2016
index-* blobs are generational files that maintain the snapshots
in the repository.  To write these atomically, we first write a
`pending-index-*` blob, then move it to `index-*`, which also deletes
`pending-index-*` in case its not a file-system level move (e.g.
S3 repositories) .  For example, to write the 5th generation of the
index blob for the repository, we would first write the bytes to
`pending-index-5` and then move `pending-index-5` to `index-5`.  It is
possible that we fail after writing `pending-index-5`, but before
moving it to `index-5` or deleting `pending-index-5`.  In this case,
we will have a dangling `pending-index-5` blob laying around.  Since
snapshot elastic#5 would have failed, the next snapshot assumes a generation
number of 5, so it tries to write to `index-5`, which first tries to
write to `pending-index-5` before moving the blob to `index-5`.  Since
`pending-index-5` is leftover from the previous failure, the snapshot
fails as it cannot overwrite this blob.

This commit solves the problem by first, adding a UUID to the
`pending-index-*` blobs, and secondly, strengthen the logic around
failure to write the `index-*` generational blob to ensure pending
files are deleted on cleanup.

Closes elastic#21462
abeyad pushed a commit that referenced this issue Nov 11, 2016
…otting (#21469)

Ensures pending index-* blobs are deleted when snapshotting.  The
index-* blobs are generational files that maintain the snapshots
in the repository.  To write these atomically, we first write a
`pending-index-*` blob, then move it to `index-*`, which also deletes
`pending-index-*` in case its not a file-system level move (e.g.
S3 repositories) .  For example, to write the 5th generation of the
index blob for the repository, we would first write the bytes to
`pending-index-5` and then move `pending-index-5` to `index-5`.  It is
possible that we fail after writing `pending-index-5`, but before
moving it to `index-5` or deleting `pending-index-5`.  In this case,
we will have a dangling `pending-index-5` blob laying around.  Since
snapshot #5 would have failed, the next snapshot assumes a generation
number of 5, so it tries to write to `index-5`, which first tries to
write to `pending-index-5` before moving the blob to `index-5`.  Since
`pending-index-5` is leftover from the previous failure, the snapshot
fails as it cannot overwrite this blob.

This commit solves the problem by first, adding a UUID to the
`pending-index-*` blobs, and secondly, strengthen the logic around
failure to write the `index-*` generational blob to ensure pending
files are deleted on cleanup.

Closes #21462
abeyad pushed a commit that referenced this issue Nov 11, 2016
…otting (#21469)

Ensures pending index-* blobs are deleted when snapshotting.  The
index-* blobs are generational files that maintain the snapshots
in the repository.  To write these atomically, we first write a
`pending-index-*` blob, then move it to `index-*`, which also deletes
`pending-index-*` in case its not a file-system level move (e.g.
S3 repositories) .  For example, to write the 5th generation of the
index blob for the repository, we would first write the bytes to
`pending-index-5` and then move `pending-index-5` to `index-5`.  It is
possible that we fail after writing `pending-index-5`, but before
moving it to `index-5` or deleting `pending-index-5`.  In this case,
we will have a dangling `pending-index-5` blob laying around.  Since
snapshot #5 would have failed, the next snapshot assumes a generation
number of 5, so it tries to write to `index-5`, which first tries to
write to `pending-index-5` before moving the blob to `index-5`.  Since
`pending-index-5` is leftover from the previous failure, the snapshot
fails as it cannot overwrite this blob.

This commit solves the problem by first, adding a UUID to the
`pending-index-*` blobs, and secondly, strengthen the logic around
failure to write the `index-*` generational blob to ensure pending
files are deleted on cleanup.

Closes #21462
abeyad pushed a commit that referenced this issue Nov 11, 2016
…otting (#21469)

Ensures pending index-* blobs are deleted when snapshotting.  The
index-* blobs are generational files that maintain the snapshots
in the repository.  To write these atomically, we first write a
`pending-index-*` blob, then move it to `index-*`, which also deletes
`pending-index-*` in case its not a file-system level move (e.g.
S3 repositories) .  For example, to write the 5th generation of the
index blob for the repository, we would first write the bytes to
`pending-index-5` and then move `pending-index-5` to `index-5`.  It is
possible that we fail after writing `pending-index-5`, but before
moving it to `index-5` or deleting `pending-index-5`.  In this case,
we will have a dangling `pending-index-5` blob laying around.  Since
snapshot #5 would have failed, the next snapshot assumes a generation
number of 5, so it tries to write to `index-5`, which first tries to
write to `pending-index-5` before moving the blob to `index-5`.  Since
`pending-index-5` is leftover from the previous failure, the snapshot
fails as it cannot overwrite this blob.

This commit solves the problem by first, adding a UUID to the
`pending-index-*` blobs, and secondly, strengthen the logic around
failure to write the `index-*` generational blob to ensure pending
files are deleted on cleanup.

Closes #21462
@clintongormley clintongormley added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs and removed :Plugin Repository S3 labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs
Projects
None yet
Development

No branches or pull requests

3 participants