Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

command: retry upload on s3.NoSuchUpload #470

Merged
merged 22 commits into from
Aug 2, 2022
Merged

command: retry upload on s3.NoSuchUpload #470

merged 22 commits into from
Aug 2, 2022

Conversation

kucukaslan
Copy link
Contributor

@kucukaslan kucukaslan commented Jul 21, 2022

With this change, doUpload method will retry the multipart upload on NoSuchUploadError if the no-such-upload-retry-count flag is used with cp or sync and the value of the special metadata in remote file does not match the one placed in upload request (or does not exists at all) . Otherwise (if it is different), it will assume that upload was successful and ignore the error.

Retry logic is placed into s3.Put since it is possible to restart upload operation and write unit test there.

For the retry condition, it will use a user defined metadata field. If the no-such-upload-retry-count given (with positive parameter), then it will put a random string as a user defined metadata. Then, upon the s3.NoSuchUpload error, it will check if that field matches with the one placed or not. If it does not match (or does not even exists), then the upload will be retried.

Fixes #450

With this change, doUpload method will retry the multipart upload on NoSuchUploadError if the "retry-on-no-such-upload" flag is used and the remote file has not been modified after the beginning of the execution (or does not exists at all) . Otherwise (if it has been modified) it will assume that upload was successfull, and ignore the error.

Beware that, this workaround approach has some drawbacks to consider:
1) If upload really failed but a third person succesfuly modified the remote file (dsturl), then this code will incorrectly return "success".
	2) It would be better to use checksum rather than the last modification date but, unfortunately it is not supported in aws-sdk-go v1.*, and it is a bit tricky to use ETag's as a replacement of checksum for the multipart uploads. Refer https://docs.aws.amazon.com/AmazonS3/latest/userguide/checking-object-integrity.html#large-object-checksums
	 see also the following as an example of using ETag as checksum https://github.com/peak/s3hash

Updates #450
@kucukaslan
Copy link
Contributor Author

kucukaslan commented Jul 22, 2022

A few updates:
I've naively thought that we could, somehow, embed this retry logic into the storage.customRetryer. It would be very nice to able to do it, since it would also enable us to write a unit test*. However after a few (tbh. a lot) attempts, I've realized that it wouldn't work. The customRetryer retries the failed request. So if we retry it there the same CompleteMultipartUpload (?) request will be retried with the same non-existent UploadId. However, what we want is to repeat the Multipart Upload operations for that file, so the solution is to restart the all of Multipart Upload operations.

On the other hand if we leave the retry logic in its current place ((Copy).doUpload) then it would be too high level to manipulate the mockAPI handlers/response to simulate the error for tests.

*Even better simply add another testcase to storage.TestS3Retry.

Update :
With 1a058fc I've moved the logic into (s3).Put method. So that we can retry as we wish and write a simple test.

@kucukaslan kucukaslan changed the title command: retry upload on NoSuchUploadError command: retry upload on NoSuchUpload Jul 22, 2022
Write simple test.

The tests does not check if the logic works as expected for when the varios last modification dates of remote. All the test assumes that there is no (such) object in remote (since the response is cleared & replaced with NoSuchUpload error).
@kucukaslan kucukaslan marked this pull request as ready for review July 22, 2022 16:10
@kucukaslan kucukaslan requested a review from a team as a code owner July 22, 2022 16:10
@kucukaslan kucukaslan requested review from aykutfarsak and seruman and removed request for a team July 22, 2022 16:10
@kucukaslan kucukaslan changed the title command: retry upload on NoSuchUpload command: retry upload on s3.NoSuchUpload Jul 26, 2022
@kucukaslan
Copy link
Contributor Author

kucukaslan commented Jul 26, 2022

Yet another update:
I think it is ready for review.

I've settled to place the retry logic into s3.Put which both allows to restart upload operation and to write unit test.
For the retry condition, it will use a special metada field as suggested. If the no-such-upload-retry-count flag is given (with positive argument), then it will put a random string as a user defined metada. Then, upon the s3.NoSuchUpload error, it will check if that field matches with the one placed or not. If it does not match (or does not even exists), then the upload will be retried.

ps. I. updated the first comment accordingly and deleted the discussion about using modification date. You can use edit history in case you need them.

command/app.go Outdated Show resolved Hide resolved
storage/s3.go Outdated Show resolved Hide resolved
command/cp.go Outdated Show resolved Hide resolved
storage/s3.go Outdated Show resolved Hide resolved
storage/s3.go Outdated Show resolved Hide resolved
storage/s3.go Outdated Show resolved Hide resolved
Remove unused "StartTime" variable.
Change flag name to  "no-such-upload-retry-count" from "retry-on-no-such-upload". Also change corresponding fields' name to [N]oSuchUploadRetryCount.
Move code generation to an external method.
... even though it is a hidden flag.
It will use debug level to log retry attemp.  Debug level is also the level. that s3.customRetryer write logs, see also
https://github.com/peak/s5cmd/blob/822043839c0e581f9b481c9ef30dd88a04463a09/storage/s3.go#L932-L934
Copy link
Member

@seruman seruman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also to make sure the default behaviour is not broken, a couple of e2e tests would be good with retry flag set; even though it would not retry.

storage/s3.go Outdated Show resolved Hide resolved
command/cp.go Show resolved Hide resolved
storage/s3.go Outdated Show resolved Hide resolved
storage/s3_test.go Outdated Show resolved Hide resolved
storage/s3_test.go Outdated Show resolved Hide resolved
storage/s3_test.go Outdated Show resolved Hide resolved
command/cp.go Outdated Show resolved Hide resolved
storage/s3.go Outdated Show resolved Hide resolved
storage/s3.go Outdated Show resolved Hide resolved
storage/s3.go Outdated Show resolved Hide resolved
@kucukaslan
Copy link
Contributor Author

kucukaslan commented Aug 2, 2022

I've tested the read-after-write consistency of object metadata in another branch. The storage.Stat request (after upload) operation for all of over 1000 objects returned the updated metadata.
Example command execution to experiment (with binary compiled from that branch):
s5cmd --stat cp --no-such-upload-retry-count 1 vendor/ s3://BUCKET/KEY_PREFIX/

storage/s3.go Outdated Show resolved Hide resolved
storage/s3.go Outdated Show resolved Hide resolved
storage/s3.go Show resolved Hide resolved
storage/s3.go Outdated Show resolved Hide resolved
storage/s3.go Outdated Show resolved Hide resolved
storage/s3.go Outdated Show resolved Hide resolved
storage/s3.go Outdated Show resolved Hide resolved
storage/s3_test.go Outdated Show resolved Hide resolved
storage/storage.go Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NoSuchUpload: The specified upload does not exist
4 participants