Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transfer Manager #37

Closed
millems opened this issue Jul 3, 2017 · 47 comments
Closed

Transfer Manager #37

millems opened this issue Jul 3, 2017 · 47 comments
Labels
1.x Parity feature-request A feature should be added or improved. p1 This is a high priority issue transfer-manager

Comments

@millems
Copy link
Contributor

millems commented Jul 3, 2017

Review the inherited state of the V1 transfer manager and determine which changes are necessary for V2.

(Feel free to comment on this issue with desired changes).

@abrooksv
Copy link
Contributor

Upload and download return a Transfer which has getProgress(). This returns a simple way to get a percent complete but requires you to do busy loops to update things like UIs.

On the other hand, ProgressListener only returns the bytes transferred causing you to have to do the percentage manually and not very easy since the total size is not exposed.

Better feature parity on the ProgressListener would be nice since that plays better with async

@millems
Copy link
Contributor Author

millems commented Jul 21, 2017

+1 on parity with progress listener. There should probably just be one system between transfer manager and the rest of the SDK for monitoring progress.

@millems
Copy link
Contributor Author

millems commented Aug 10, 2017

Use DirectoryStream for loading files from a directory to avoid having to load all file names into memory. See aws/aws-sdk-java#1271.

@kiiadi
Copy link
Contributor

kiiadi commented Aug 16, 2017

Unreasonable to expect the thread-pool will be unbounded in order to avoid deadlocks see : aws/aws-sdk-java#939

@spfink
Copy link
Contributor

spfink commented Sep 13, 2017

TransferManager features requests from v1:

aws/aws-sdk-java#117
aws/aws-sdk-java#284
aws/aws-sdk-java#474
aws/aws-sdk-java#645
aws/aws-sdk-java#893
aws/aws-sdk-java#964
aws/aws-sdk-java#988
aws/aws-sdk-java#1215
aws/aws-sdk-java#1207
aws/aws-sdk-java#1103

@millems
Copy link
Contributor Author

millems commented Sep 29, 2017

aws/aws-sdk-java#1321

@millems
Copy link
Contributor Author

millems commented Jan 2, 2018

Allow using a finite number of threads for background processing. Currently, 1.11.x's TransferManager is reported to require an unbounded thread pool to prevent deadlocks.

@erikedlund
Copy link

+1 for aws/aws-sdk-java#1103, a request for the ability to limit bandwidth for S3 uploads/downloads. See also the recently closed issue from the aws-cli repo:
aws/aws-cli#1090

This same feature would be similarly useful in the Java SDK to help avoid fees from ISPs for excessive bandwidth usage, or to prevent a single application from overwhelming a network's capacity.

@zhiqiangZHAO
Copy link

+1 for aws/aws-sdk-java#1103, too fast data downloading will saturate the network usage.

@sql4bucks
Copy link

+1 for aws/aws-sdk-java#474.

It is very inefficient to write to the file system then upload from the file when I have an object in memory I can serialize directly to a stream. It seems counterintuitive to provide the total content length up front when providing a stream as input - I have to work around it instead of just use it.

@josephsmithiv
Copy link

+1 for aws/aws-sdk-java#893

The primitive AmazonS3 client is capable of uploading and downloading to and from a stream, as well as from a file. The TransferManager can also upload from either a stream or a file, but can only download to a file - not a stream. Symmetry in the interface would be nice. For large files - the kind for which multi-part uploads are most valuable - I can understand that attempting to buffer contents in memory is unwise. However, for small files, I question the value of having to write and read from disk. The download/upload interface is pleasingly abstract, relative to the interface of the primitive client, and I'd like to favor it no matter the size of my files.

@TeresaP
Copy link

TeresaP commented Nov 29, 2018

+1 for aws/aws-sdk-java#1207. I need to be able to upload a lot of files all at once while specifying the ACL. Our customers will be using the CLI to upload files in parallel and I need to closely match the performance in simulating file uploads. If I don't specify the ACL flag, our service cannot read those files and my tests are useless.

@bisoldi
Copy link

bisoldi commented Dec 18, 2018

+1 for aws/aws-sdk-java#893

My current use-case for this is that I have 100MB+ compressed (GZIP) files on S3 that I need to download and perform some further conversion on.

It would be great to take advantage of multi-part download and have that stream through Java's GZIPInputStream so that I don't need to download and then uncompress separately.

@chrisvire
Copy link

Is is possible to support the aws cli style of "sync" where TransferManager decides which files are different and only uploads the different ones?

@alexmojaki
Copy link

Reminder that for aws/aws-sdk-java#474, I have written a library using the SDK v1 which allows streaming data to S3 without knowing the size beforehand and without keeping it all in memory or writing to disk. You may find the source code helpful for implementing the feature in v2. I am not planning on porting the library to use v2. Implementing the feature in v2 may have advantages over my library, e.g. by using asnyc non-blocking I/O instead of many threads.

@sql4bucks see the library if you haven't already, you may find it useful.

@justnance justnance added feature-request A feature should be added or improved. and removed Feature Request labels Apr 19, 2019
@dagnir
Copy link
Contributor

dagnir commented Apr 19, 2019

Thanks @alexmojaki. We will keep this in mind when investigating how to address aws/aws-sdk-java#474.

@dagnir
Copy link
Contributor

dagnir commented Apr 23, 2019

Hi all,

For anyone interested, here is the current design for TransferManager: https://github.com/aws/aws-sdk-java-v2/tree/master/docs/design/services/s3/transfermanager. Feel free to leave feedback and comments!

The README will be updated soon to go into depth on current prototype.

@abhimanyu4211
Copy link

Hi all,

For anyone interested, here is the current design for TransferManager: https://github.com/aws/aws-sdk-java-v2/tree/master/docs/design/services/s3/transfermanager. Feel free to leave feedback and comments!

The README will be updated soon to go into depth on current prototype.

When can this be expected for use? And which version?

@millems millems changed the title Refactor: Transfer Manager Transfer Manager Jul 8, 2019
@ashishdhingra
Copy link
Contributor

#2731

Bennett-Lynch pushed a commit to Bennett-Lynch/aws-sdk-java-v2 that referenced this issue Oct 21, 2021
This adds initial support for S3TransferManager TransferListeners. The
motivation and design is consistent as outlined in
aws#2729. It also addresses
some customer asks as mentioned in
aws#37.

Every @SdkPublicApi has been thoroughly documented with its description
and usage instructions where applicable.
Bennett-Lynch added a commit that referenced this issue Oct 21, 2021
Add support for S3TransferManager TransferListeners

This adds initial support for S3TransferManager TransferListeners. The
motivation and design is consistent as outlined in
#2729. It also addresses
some customer asks as mentioned in
#37. For more context,
see the discussion in #2770.

Every @SdkPublicApi has been thoroughly documented with its description
and usage instructions where applicable.
@exoego
Copy link

exoego commented Nov 7, 2021

downloadDirectory is missing.

@mjdinsmore
Copy link

mjdinsmore commented Nov 15, 2021

When using the TransferManager API (see https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/transfer-manager.html), I'm experiencing an exception like?
Caused by: java.util.concurrent.CompletionException: software.amazon.awssdk.crt.s3.CrtS3RuntimeException: Retry cannot be attempted because the maximum number of retries has been exceeded. AWS_IO_MAX_RETRIES_EXCEEDED(1069). Note: this exception occurs in less than a second after running the unit test code.

Is there any way to ensure the underlying S3Client is okay when using the TransferManager? How can you validate that it is properly configured? Setting it up is straightforward:
S3TransferManager transferManager = S3TransferManager.builder() .s3ClientConfiguration(b -> b.credentialsProvider(AwsUtils.awsCredentialsProvider()) .region(Region.US_EAST_1) .targetThroughputInGbps(1.0) .minimumPartSizeInBytes(FileUtils.ONE_MB) .maxConcurrency(4) ) .build();

but there's no way, that I'm aware of, to see if things are properly configured. It might be nice to be able to expose the underlying (or specify) the HTTP Client for this sort of validation testing.

@Zhenye-Na
Copy link

aws/aws-sdk-java#1572

@adrian-skybaker
Copy link

Do you have a target in mind for moving to a non-preview release?

@millems
Copy link
Contributor Author

millems commented Apr 7, 2022

We do have a target in mind, but unfortunately we can't share dates. We're making steady progress, though, and there's not many features left in the backlog before GA! Sorry, I know that dates would be really helpful for planning purposes.

@djchapm
Copy link

djchapm commented Jun 21, 2022

Can you confirm if #474 (highest upvoted above) will get prioritized? (and if not - why?)

@zoewangg
Copy link
Contributor

Hi all, to provide an update, below are the features that will be included in the GA release. The only remaining feature that we are currently working on is #7. Feedback on the APIs is welcome!

  1. download an S3 object to a file or any destination
  2. upload a single object of any content (if it's not a file, content-length must be supplied)
  3. download all objects in a bucket to a local directory
  4. upload all files in a directory recursively to an S3 bucket
  5. copy data from one Amazon S3 location to another Amazon S3 location.
  6. pause an ongoing single file download and resume it at a later time
  7. pause an ongoing single file upload and resume it at a later time

We will use separate issues to track features that are not in GA scope (they will get prioritized based on the number of 👍🏼s).

@zoewangg
Copy link
Contributor

@djchapm as mentioned in #2714 (comment), this feature is in our backlog, and we will track it in #139

@kingan379
Copy link

@zoewangg what about aborting the upload? Is there a plan to add a method for that in TransferManager v2 (as it was in v1)?

@zoewangg
Copy link
Contributor

zoewangg commented Aug 3, 2022

@kingan379 you can cancel the transfer by cancelling the future in the returned upload, CompletableFuture#cancel. Note that the current implementation is to stop scheduling new multipart uploads, wait for all existing uploads to finish and then invoke abortMultipart API, so it may not abort the transfer immediately. See #3274 (comment) for more details.

@kingan379
Copy link

Is there a plan to support InputStream in S3TransferManager?

@AlexOkayJ
Copy link

Hello!
Is there are any plans to support parallel upload/download with AmazonS3Encryption client?

@yasminetalby yasminetalby added the p1 This is a high priority issue label Nov 12, 2022
@zoewangg
Copy link
Contributor

Hi all, we are pleased to announce the general availability of the S3 Transfer Manager 🎉. Check out the following links to get started.
Blog post
Dev Guide
Javadoc

As always, we welcome feature requests, bug reports and feedback.🙂 I'm going to close this issue. Feel free to create new GH issues.

@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

@zoewangg
Copy link
Contributor

Is there are any plans to support parallel upload/download with AmazonS3Encryption client?

Hi @AlexOkayJ, client encryption support is tracked in #34

@zoewangg
Copy link
Contributor

Is there a plan to support InputStream in S3TransferManager?

Hi @kingan379, it's supported now. You can use AsyncRequestBody#fromInputStream and pass it to S3TransferManager#upload.

        S3TransferManager transferManager = S3TransferManager.create();

        UploadRequest uploadRequest = UploadRequest.builder()
                                                   .requestBody(AsyncRequestBody.fromInputStream(inputStream))
                                                   .putObjectRequest(req -> req.bucket("bucket").key("key"))
                                                   .build();

        Upload upload = transferManager.upload(uploadRequest);

@daniel-teodoro
Copy link

Hi!

I'm getting this error with sample code on eclipse:

image

"The method s3ClientConfiguration(( cfg) -> {}) is undefined for the type S3TransferManager.Builder"

How can I fix it, please ?

Thanks

@daniel-teodoro
Copy link

I commented this s3ClientConfiguration, but now, the join() method is throwing this error:
"Unable to execute HTTP request: SSLEngine closed already".
Would anyone have a clue?
Thanks again!

     S3TransferManager transferManager = S3TransferManager.create();
     
     GetObjectRequest getObjectRequest = GetObjectRequest.builder()
    		 //.overrideConfiguration(SdkHttpConfigurationOption.TRUST_ALL_CERTIFICATES)
                .bucket(AWS_S3_BUCKET_NAME)
                .key(dirS3)
                .build();

     DownloadFileRequest downloadFileRequest =
    		 DownloadFileRequest.builder()
             	.getObjectRequest(getObjectRequest)
                	.destination(Paths.get(dirLocalArqDownload))
                    .addTransferListener(LoggingTransferListener.create())
                    .build();
     
     FileDownload download = transferManager.downloadFile(downloadFileRequest);
     download.completionFuture().join();

Exception in thread "main" java.util.concurrent.CompletionException: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: SSLEngine closed already
at software.amazon.awssdk.utils.CompletableFutureUtils.errorAsCompletionException(CompletableFutureUtils.java:65)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncExecutionFailureExceptionReportingStage.lambda$execute$0(AsyncExecutionFailureExceptionReportingStage.java:51)
at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930)
at java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2152)
at software.amazon.awssdk.utils.CompletableFutureUtils.lambda$forwardExceptionTo$0(CompletableFutureUtils.java:79)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2152)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.maybeAttemptExecute(AsyncRetryableStage.java:103)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.maybeRetryExecute(AsyncRetryableStage.java:184)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.lambda$attemptExecute$1(AsyncRetryableStage.java:159)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2152)
at software.amazon.awssdk.utils.CompletableFutureUtils.lambda$forwardExceptionTo$0(CompletableFutureUtils.java:79)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2152)
at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.lambda$null$0(MakeAsyncHttpRequestStage.java:103)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2152)
at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.lambda$executeHttpRequest$3(MakeAsyncHttpRequestStage.java:165)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
at java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
at java.base/java.lang.Thread.run(Thread.java:832)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.x Parity feature-request A feature should be added or improved. p1 This is a high priority issue transfer-manager
Projects
None yet
Development

No branches or pull requests