-
Notifications
You must be signed in to change notification settings - Fork 863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Discussion] Add initial design tenets/goals for S3 TransferManager #1120
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1120 +/- ##
============================================
- Coverage 58.65% 58.63% -0.02%
+ Complexity 4532 4531 -1
============================================
Files 738 738
Lines 22751 22751
Branches 1699 1699
============================================
- Hits 13344 13341 -3
- Misses 8717 8719 +2
- Partials 690 691 +1
Continue to review full report at Codecov.
|
than using the S3 client. In the majority of situations, it is more | ||
performant. | ||
1. TransferManager provides a truly asynchronous API that conforms to the norms | ||
present in the rest of the SDK. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking, I assume?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good catch. I'll update the wording
|
||
* TransferManager supports progress listeners that are easier to use. | ||
|
||
Ref: https://github.com/aws/aws-sdk-java-v2/issues/37#issuecomment-316218667 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More of a meta-question: do we know yet whether these will be S3-specific, or built into the core?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense for them to be same thing. Or at least, TransferManager supports both the core and an "enhanced" TransferManager specific version if it makes sense to have one.
simultaneously, not just those uploaded using the Multipart API. | ||
|
||
* TransferManager has the ability to upload to and download from a pre-signed | ||
URL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this include parallel downloads (see previous goal)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For downloads definitely. I wasn't sure if it's possible for uploads so I didn't want to specify parallel in case it's not. I'll confirm and then update the wording if necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Parallel downloads are possible, but there might be edge cases where we can't compute object length and hence can't do parallel downloads
- Multi-part uploads are possible but its complex and requires customers to provide multiple presigned urls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case I'd probably just leave it as is for now as is for now, or add a caveat that parallel parallel downloads where possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add in a statement on checksums as well. Those can be done multipart now.
|
||
# Non Project Goals | ||
|
||
1. Ability to use the blocking, synchronous client. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure where you separate client vs http client but I think it should be a primary goal to support sync http clients like Apache. Each HTTP client has different levels of configuration and allowing customers to use their preferred one should be a requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess both as in the blocking service client and HTTP clients. I can't see supporting a sync HTTP client as a goal can coexist with the async/non-blocking, efficiency, and performance goals. We already saw a lot of this when we explored whether we could expose sync HTTP clients as async and vice versa.
To me, a big part of the original TransferManager was to provide the async/non-blocking experience so by focusing on using the async HTTP clients, we further enhance that experience.
If it's a matter of configuration I would think supporting a config present in Apache but not Netty would be easier to achieve than supporting Apache/a non blocking HTTP client in TransferManager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think just limiting it to non-blocking clients is fine, as long as we have support for HTTP proxies in Netty. Sync clients and async abstractions like TransferManager don't mix well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So no shared connection pool if customers are using the sync s3 client? Which is made worse given that you have to use both the low-level client and transfer manager.
No lightweight JDK client?
No GAE support?
A worse debugging experience?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my view, TransferManager is an enhancement of the async client, where this is all true right now as well.
I think it's important that customers who can't use the async clients for the reasons given above are not left without an option, but I think that means we provide a similar, but separate library for non-async clients.
|
||
This project provides a much improved experience for S3 customers needing to | ||
easily perform uploads and downloads of objects to and from S3 by providing | ||
the S3 `TransferManager`, a high level library built on the S3 client. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
S3TransferManager
? 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 sounds good
|
||
Ref: https://github.com/aws/aws-sdk-java/issues/1207 | ||
|
||
* Trailing checksums for parallel uploads using the Multipart API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And downloads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
1. TransferManager provides a truly asynchronous, non-blocking API that conforms | ||
to the norms present in the rest of the SDK. | ||
1. TransferManager makes efficient use of system resources. | ||
1. TransferManager performs well in cold JVMs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this actually needs to be a project goal. Once you get to the point where using Transfer Manager is more performant, a slightly increased cold start won't likely be a big deal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And to further clarify, I don't think this is actually going to be an issue in Transfer Manager but if for some weird reason we needed to use something like ObjectMapper I don't think that'd be a huge deal for this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that sounds fine to me. As you said, I don't foresee the implementation doing anything that would increase cold start over the baseline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than your project goals section all being numbered 1, LGTM
@spfink The rendered view is numbered correctly :). Saves having to fix the numbers as you add/remove things https://github.com/aws/aws-sdk-java-v2/blob/5b26383246987213a35dd3fc459d3f336d15db5a/docs/design/services/s3/transfermanager/README.md |
6beab70
to
784e278
Compare
…2c565149c Pull request: release <- staging/ad442211-3f8c-489d-af73-bc72c565149c
Initial goals/tenets document for S3 TransferManager. Feedback requested!