SNAPSHOT: More Resilient Writes to Blob Stores #36927

original-brownbear · 2018-12-21T07:21:28Z

Fix potential failure to write the index gen files on master failover by making the writes
to the blob store idempotent as long as the snapshot generation does not change
- This moves the burden of consistency into the ES cluster state handling instead of leaving it in part with the
  blob store, that doesn't have any relevant consistency guarantees in most cases.
- This PR does not change behaviour for S3 at all since there is no file existence check for that store anyway
- This does add the fresh risk of overwriting the index.latest file with an incorrect generation in the unlikely case a master becoming stuck for a long time before writing that file out (which is incredibly unlikely in practice if not impossible because we do verify the index generation before writing in org.elasticsearch.repositories.blobstore.BlobStoreRepository#writeIndexGen so the blob store would need to be inconsistent for a timespan longer than it takes the master to fail over)
Fixes the test mentioned in Master failover during snapshotting could leave the snapshot incomplete #25281
- Actually block execution on writing index file when that flag is set on the MockRepository
- The master failover does not fail the snapshot if we don't do the file existence checks before writing (this is what would happen on S3 anyway) but instead finishes partially for the shards that reside on the live datanode
closes Master failover during snapshotting could leave the snapshot incomplete #25281

* Fix potentially failure to write the index gen files on master failover by making the writes to the blob store indempotent as long as the snapshot generation does not change * This moves the burden of consistency into the ES cluster state handling instead of leaving it in part with the blob store, that doesn't have any relevant consistency guarantees in most cases. * This PR does not change behaviour for S3 at all since there is no file existance check for that store anyway * closes elastic#25281

elasticmachine · 2018-12-21T07:21:29Z

Pinging @elastic/es-distributed

original-brownbear · 2018-12-21T07:47:59Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

@@ -738,7 +731,7 @@ protected void writeIndexGen(final RepositoryData repositoryData, final long rep
        // write the index file
        final String indexBlob = INDEX_FILE_PREFIX + Long.toString(newGen);
        logger.debug("Repository [{}] writing new index generational blob [{}]", metadata.name(), indexBlob);
-        writeAtomic(indexBlob, snapshotsBytes, true);
+        writeAtomic(indexBlob, snapshotsBytes, false);


This should be idempotent anyway, all we're doing here is potentially preventing deleting the old index file or updating the latest blob file in subsequent steps if either of those failed before a master failover.

original-brownbear · 2018-12-21T07:51:24Z

server/src/main/java/org/elasticsearch/repositories/blobstore/ChecksumBlobStoreFormat.java

@@ -150,7 +150,7 @@ public void write(T obj, BlobContainer blobContainer, String name) throws IOExce
        final String blobName = blobName(name);
        writeTo(obj, blobName, bytesArray -> {
            try (InputStream stream = bytesArray.streamInput()) {
-                blobContainer.writeBlob(blobName, stream, bytesArray.length(), true);
+                blobContainer.writeBlob(blobName, stream, bytesArray.length(), false);


This is idempotent as far as I can see. It seems easier to simply quietly skip an existing file than to suppress the file already exists exception and keep going with writing the index gen files upstream since S3 doesn't have this check anyway (the only downside I see is that we incur traffic for blob stores that do have a consistent existence check here, not sure if that's worth the extra effort though?)?

The more I think about it, the more I dislike that we even have the failIfAlreadyExists flag on the BlobContainer interface. The parameter is simply ignored for S3 and won't reliably work with some NFS.
So in the end what we get is that our tests aren't really valid for S3 etc. in failover scenarios and the situation where any fixes we make to failover issues that are based on this check working will not be valid for S3.

=> I'm still a fan of this change, IMO it makes more sense fixing unforeseen issues with overwriting these files than fixing continuing to test and fix behavior that doesn't apply to all our blob stores.

ywelsch

I'm not convinced that this is making things more resilient. In particular, the added leniency in writeIndexGen concerns me. I think we'll have to more fundamentally rethink the flow for writing files to the repository when dealing with failovers (whether it's master or data nodes).

original-brownbear · 2018-12-27T15:01:49Z

@ywelsch

I'm not convinced that this is making things more resilient. In particular, the added leniency in writeIndexGen concerns me.

Are we really worse off in any real-world situation though with this? (I couldn't come up with any except for the master failover I mentioned in the PR description and as I mentioned above we're in the same spot for S3 that we currently are I think and as a result of that we are no worse off than for S3 with any other blob store either)

I think we'll have to more fundamentally rethink the flow for writing files to the repository when dealing with failovers (whether it's master or data nodes).

I would argue that since blob stores have no reliable guarantees across implementations moving the logic towards using the cluster state for consistency and making writes to the blob stores immutable for constant cluster state like I did here is at least in theory the way to go isn't it? The cluster state seems to be our only reliable source of consistency here.

original-brownbear · 2019-06-05T10:20:38Z

closing this in favor of the more general suggestion on this in #42886

original-brownbear added >bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v7.0.0 v6.7.0 labels Dec 21, 2018

original-brownbear added WIP and removed WIP labels Dec 21, 2018

original-brownbear commented Dec 21, 2018

View reviewed changes

ywelsch self-requested a review December 24, 2018 14:19

ywelsch suggested changes Dec 27, 2018

View reviewed changes

jasontedor added v8.0.0 and removed v7.0.0 labels Feb 6, 2019

original-brownbear added 2 commits February 6, 2019 16:29

Merge remote-tracking branch 'elastic/master' into 25281

9cb1b8a

simplified

29d4905

danielmitterdorfer added v7.2.0 and removed v6.7.0 labels Feb 7, 2019

original-brownbear added 2 commits February 8, 2019 16:03

Merge remote-tracking branch 'elastic/master' into 25281

a27db4b

revert test change

f7bc47b

original-brownbear mentioned this pull request Feb 15, 2019

S3 Snapshot Repository Erroneously Assumes Consistent List Operation #38941

Closed

original-brownbear mentioned this pull request Jun 5, 2019

Remove BlobContainer APIs that Are not Universally Applicable #42886

Closed

original-brownbear closed this Jun 5, 2019

original-brownbear mentioned this pull request Aug 19, 2019

Make Snapshot Logic Write Metadata after Segments #45689

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SNAPSHOT: More Resilient Writes to Blob Stores #36927

SNAPSHOT: More Resilient Writes to Blob Stores #36927

original-brownbear commented Dec 21, 2018 •

edited

Loading

elasticmachine commented Dec 21, 2018

original-brownbear Dec 21, 2018

original-brownbear Dec 21, 2018

original-brownbear Feb 6, 2019

ywelsch left a comment

original-brownbear commented Dec 27, 2018

original-brownbear commented Jun 5, 2019

SNAPSHOT: More Resilient Writes to Blob Stores #36927

SNAPSHOT: More Resilient Writes to Blob Stores #36927

Conversation

original-brownbear commented Dec 21, 2018 • edited Loading

elasticmachine commented Dec 21, 2018

original-brownbear Dec 21, 2018

Choose a reason for hiding this comment

original-brownbear Dec 21, 2018

Choose a reason for hiding this comment

original-brownbear Feb 6, 2019

Choose a reason for hiding this comment

ywelsch left a comment

Choose a reason for hiding this comment

original-brownbear commented Dec 27, 2018

original-brownbear commented Jun 5, 2019

original-brownbear commented Dec 21, 2018 •

edited

Loading