-
Notifications
You must be signed in to change notification settings - Fork 641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3: Add MultiPartUpload and ListParts APIs #2730
Conversation
@@ -114,6 +115,244 @@ object MultipartUploadResult { | |||
) | |||
} | |||
|
|||
final class AWSIdentity private (val id: String, val displayName: String) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if you have better suggestions for AWSIdentity
, basically this same datastructure is used in many places where you want to identify who or what initiated the request and owns the entity in question
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me.
|
||
case object Starting extends ListBucketState | ||
final case class Starting[T]() extends S3PaginationState[T] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally both Starting
and Finished
should be case objects however I didn't manage to get type inference to work nicely by doing case object Starting extends S3PaginationState[Nothing]
so I ended up converting them to a case class that has a zero parameters since thats the only way to propagate the type T
across.
Maybe there is a nicer solution for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not fight for it. It is internal API anyway.
s3/src/test/scala/akka/stream/alpakka/s3/scaladsl/S3IntegrationSpec.scala
Outdated
Show resolved
Hide resolved
8afc2b5
to
b11e714
Compare
So one thing that I have noticed is that locally the tests for multipart upload that I have added work when I run them against a real S3 instance however the TravisCI in the gihub repo is failing these tests. I think this may be due to the fact that TravisCI is doing tests against Minio and I suppose that Minio doesn't have the same behavior as the real S3 in this regard? |
That sounds likely. Did you try to use MinIO locally? |
78366e0
to
df122c6
Compare
That is next on the list, I have just finished updating the initial test (good news, I can validate that aborting multipart upload and the result of listing current aborted multipart uploads is behaving 100% as I expect for actual S3) so I am going to look into why Minio isn't working. Apart from this I need to implement the transparent resuming capability and it should be good to go. |
9bf205c
to
05e0026
Compare
@ennru I just ran minio tests locally and can confirm they also fail in the exact same way Travis CI does, so it appears that minio doesn't have the same behavior for list aborted multipart uploads. Will have a look if there is an upstream issue on this, also need to figure out how to disable this test only when |
05e0026
to
1866209
Compare
So I just checked upstream on minio about this and it seems this change is intentional however I don't agree with the reasoning (see minio/minio#5613 (comment)). I have created a new issue at minio/minio#13246 but for now I have disabled this test for minio specifically |
Okay so an update coming from minio/minio#13246 (comment), basically its not intended to use the state on S3 server when completing previously aborted multipart uploads, instead the state is designed to be maintained locally on the client (which is how typical S3 clients including alpakka works). I didn't actually get a completely clear answer about what actual issues are with using the list multipart uploads route, from what I gathered there are concurrency issues (which I assume occur when you are doing concurrent uploads to a single
That we shouldn't be using this method of resuming a multipart upload. Given this, creating a public alpakka/s3/src/main/scala/akka/stream/alpakka/s3/impl/S3Stream.scala Lines 524 to 529 in ceaa570
alpakka/s3/src/main/scala/akka/stream/alpakka/s3/scaladsl/S3.scala Lines 266 to 291 in 5c66bf0
partNumber /etag along with an uploadId that will allow you to resume an already existing upload that has been somehow paused.
This means that its up to the S3 user how they handle uploads, we give them the tools and can even document that its not recommend to use parts retrieved from the list multipart uploads API @ennru Do you agree with this? |
Thank you for checking the underlying recommendations. Sounds good to me. Instead of making |
Indeed that is what I meant, thanks for confirming I will proceed with the changes. |
88e4095
to
e7cccd7
Compare
@ennru The PR is now ready, some additional notes
I added a test which covers the whole scenario, i.e. created a multipart upload, aborting it with an exception (via a killswitch), using I have also disabled the tests when they are run via Let me know if anything else is needed! |
e7cccd7
to
e5acfa3
Compare
I actually ended up adding the |
e5acfa3
to
88f3fdf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite a chunk of great work @mdedetrich!
LGTM.
|
||
case object Starting extends ListBucketState | ||
final case class Starting[T]() extends S3PaginationState[T] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not fight for it. It is internal API anyway.
@@ -114,6 +115,244 @@ object MultipartUploadResult { | |||
) | |||
} | |||
|
|||
final class AWSIdentity private (val id: String, val displayName: String) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me.
This PR adds https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListMultipartUploads.html and https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListParts.html to the S3 API with the ultimate aim of being able to automatically resume a previously aborted S3 multipart upload with the same given bucket/key in the same way that
GCStorage.resumableUpload
worksThe implementation of
S3.resumableUpload
still needs to be done using the rough logic outlined in https://stackoverflow.com/questions/53764876/resume-s3-multipart-upload-partetag. The basic implementation ofS3.resumableUpload
is as followsS3.listMultipartUpload
. If not then just call already existingS3.multipartUpload
otherwiseS3.listMultipartUpload
callS3.listPart
in order to retrieve the latesteTag
/partNumber
?S3.multipartUpload
with that giveneTag
/partNumber
As an additional note I have generalized the
ListBucketState
to work with any arbitrary type rather than justString
by renaming it toS3PaginationState
and allowing it to accept a type parameter, this change was put into its own commit so that its clear. This is a necessary because the added API calls either have a different type for acontinuationToken
(i.e.Int
instead ofString
) or thecontinuationToken
requires multiples tokens rather than just a single one.@ennru I am creating this PR prematurely as a draft since it ended up being quite big and I want someone else to have a look at it to see if I am on the right track. I have commented on specific parts of the PR just to clarify things
. Here is a checklist of the things that need to be done
[ ] Add aS3.resumableUpload
function to the S3 API[ ] Add a test for theS3.resumableUpload
S3Stream.completeMultipartUpload
public with a proper Scala/Java API to allow users to manually complete a multipart upload on their willS3.resumeMultipartUpload
so that it accepts an optional sequence ofpartNumber
/etag
along with anuploadId
parameter that lets you resume an upload from a given arbitrary part. This will allow you to manually resume a multipart upload, its up to you retrieve these partNumbers/etags.S3.multipartUpload