Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] Add initial design tenets/goals for S3 TransferManager #1120

Merged
merged 1 commit into from
Mar 16, 2019

Conversation

dagnir
Copy link
Contributor

@dagnir dagnir commented Mar 8, 2019

Initial goals/tenets document for S3 TransferManager. Feedback requested!

@codecov-io
Copy link

codecov-io commented Mar 8, 2019

Codecov Report

Merging #1120 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #1120      +/-   ##
============================================
- Coverage     58.65%   58.63%   -0.02%     
+ Complexity     4532     4531       -1     
============================================
  Files           738      738              
  Lines         22751    22751              
  Branches       1699     1699              
============================================
- Hits          13344    13341       -3     
- Misses         8717     8719       +2     
- Partials        690      691       +1
Impacted Files Coverage Δ Complexity Δ
...nio/netty/internal/OldConnectionReaperHandler.java 81.81% <0%> (-9.1%) 14% <0%> (-1%)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d17d35f...784e278. Read the comment docs.

than using the S3 client. In the majority of situations, it is more
performant.
1. TransferManager provides a truly asynchronous API that conforms to the norms
present in the rest of the SDK.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking, I assume?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good catch. I'll update the wording


* TransferManager supports progress listeners that are easier to use.

Ref: https://github.com/aws/aws-sdk-java-v2/issues/37#issuecomment-316218667
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More of a meta-question: do we know yet whether these will be S3-specific, or built into the core?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense for them to be same thing. Or at least, TransferManager supports both the core and an "enhanced" TransferManager specific version if it makes sense to have one.

simultaneously, not just those uploaded using the Multipart API.

* TransferManager has the ability to upload to and download from a pre-signed
URL.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this include parallel downloads (see previous goal)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For downloads definitely. I wasn't sure if it's possible for uploads so I didn't want to specify parallel in case it's not. I'll confirm and then update the wording if necessary

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Parallel downloads are possible, but there might be edge cases where we can't compute object length and hence can't do parallel downloads
  • Multi-part uploads are possible but its complex and requires customers to provide multiple presigned urls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case I'd probably just leave it as is for now as is for now, or add a caveat that parallel parallel downloads where possible.

Copy link
Contributor

@spfink spfink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add in a statement on checksums as well. Those can be done multipart now.


# Non Project Goals

1. Ability to use the blocking, synchronous client.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure where you separate client vs http client but I think it should be a primary goal to support sync http clients like Apache. Each HTTP client has different levels of configuration and allowing customers to use their preferred one should be a requirement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess both as in the blocking service client and HTTP clients. I can't see supporting a sync HTTP client as a goal can coexist with the async/non-blocking, efficiency, and performance goals. We already saw a lot of this when we explored whether we could expose sync HTTP clients as async and vice versa.

To me, a big part of the original TransferManager was to provide the async/non-blocking experience so by focusing on using the async HTTP clients, we further enhance that experience.

If it's a matter of configuration I would think supporting a config present in Apache but not Netty would be easier to achieve than supporting Apache/a non blocking HTTP client in TransferManager.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just limiting it to non-blocking clients is fine, as long as we have support for HTTP proxies in Netty. Sync clients and async abstractions like TransferManager don't mix well.

Copy link
Contributor

@spfink spfink Mar 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So no shared connection pool if customers are using the sync s3 client? Which is made worse given that you have to use both the low-level client and transfer manager.

No lightweight JDK client?

No GAE support?

A worse debugging experience?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my view, TransferManager is an enhancement of the async client, where this is all true right now as well.

I think it's important that customers who can't use the async clients for the reasons given above are not left without an option, but I think that means we provide a similar, but separate library for non-async clients.


This project provides a much improved experience for S3 customers needing to
easily perform uploads and downloads of objects to and from S3 by providing
the S3 `TransferManager`, a high level library built on the S3 client.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S3TransferManager? 😉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 sounds good


Ref: https://github.com/aws/aws-sdk-java/issues/1207

* Trailing checksums for parallel uploads using the Multipart API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And downloads?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

1. TransferManager provides a truly asynchronous, non-blocking API that conforms
to the norms present in the rest of the SDK.
1. TransferManager makes efficient use of system resources.
1. TransferManager performs well in cold JVMs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this actually needs to be a project goal. Once you get to the point where using Transfer Manager is more performant, a slightly increased cold start won't likely be a big deal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And to further clarify, I don't think this is actually going to be an issue in Transfer Manager but if for some weird reason we needed to use something like ObjectMapper I don't think that'd be a huge deal for this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that sounds fine to me. As you said, I don't foresee the implementation doing anything that would increase cold start over the baseline.

Copy link
Contributor

@spfink spfink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than your project goals section all being numbered 1, LGTM

@dagnir
Copy link
Contributor Author

dagnir commented Mar 15, 2019

@spfink The rendered view is numbered correctly :). Saves having to fix the numbers as you add/remove things https://github.com/aws/aws-sdk-java-v2/blob/5b26383246987213a35dd3fc459d3f336d15db5a/docs/design/services/s3/transfermanager/README.md

@dagnir dagnir force-pushed the transfermanager-goals branch from 6beab70 to 784e278 Compare March 16, 2019 00:02
@dagnir dagnir merged commit aedf9cb into aws:master Mar 16, 2019
aws-sdk-java-automation pushed a commit that referenced this pull request Dec 21, 2020
…2c565149c

Pull request: release <- staging/ad442211-3f8c-489d-af73-bc72c565149c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants