Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not check for S3 blob to exist before writing #31128

Merged
merged 1 commit into from
Jun 6, 2018

Conversation

ywelsch
Copy link
Contributor

@ywelsch ywelsch commented Jun 6, 2018

Amazon S3 provides a weak consistency model (see https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html). From the docs (emphasis mine):

Amazon S3 provides read-after-write consistency for PUTS of new objects in your S3 bucket in all regions with one caveat. The caveat is that if you make a HEAD or GET request to the key name (to find if the object exists) before creating the object, Amazon S3 provides eventual consistency for read-after-write.
Amazon S3 offers eventual consistency for overwrite PUTS and DELETES in all regions.
Updates to a single key are atomic. For example, if you PUT to an existing key, a subsequent read might return the old data or the updated data, but it will never write corrupted or partial data.

In #19749 an extra check was added before writing each blob to ensure that we would not be overriding an existing blob. Due to S3's weak consistency model, this check was best effort. To make matters worse, however, this resulted in a HEAD request to be done before every PUT, in particular also when PUTTING a new object. The approach taken in #19749 worsened our consistency guarantees for follow-up snapshot actions, as it made it less likely for new files that had been written to be available for reads.

My suggestion is therefore to remove this extra check. Due to the weak consistency model, this check was a best effort thing anyway, and there's currently no way to prevent accidental overrides on S3.

@ywelsch ywelsch added >enhancement :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v7.0.0 v6.4.0 labels Jun 6, 2018
@ywelsch ywelsch requested a review from tlrx June 6, 2018 08:20
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @ywelsch, I like these improvements.

@ywelsch ywelsch merged commit 515a233 into elastic:master Jun 6, 2018
ywelsch added a commit that referenced this pull request Jun 6, 2018
In #19749 an extra check was added before writing each blob to ensure that we would not be
overriding an existing blob. Due to S3's weak consistency model, this check was best effort. To
make matters worse, however, this resulted in a HEAD request to be done before every PUT, in
particular also when PUTTING a new object. The approach taken in #19749 worsened our
consistency guarantees for follow-up snapshot actions, as it made it less likely for new files that
had been written to be available for reads.

This commit therefore removes this extra check. Due to the weak consistency model, this check
was a best effort thing anyway, and there's currently no way to prevent accidental overrides on S3.
infa-dsokolov added a commit to infa-dsokolov/elasticsearch that referenced this pull request Oct 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants