-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minio s3 ListMultipartUploads #5613
Comments
I'm seeing something very similar using duplicity, which uses boto 2.48.0 under the hood. Duplicity believes the chunks have been uploaded successfully but I'm using the following version:
|
@rvolykh The following script works fine with the latest release
@benagricola
Can you do |
Thanks for responses, updates:
minio/data catalog:
|
Trace log from my side (note boto version for this log is 2.42.0 but same occurs with 2.48.0), looks very similar to output from @rvolykh:
|
@rvolykh can you give me a sample script to reproduce the problem, because the script you have me (pasted below) works as fine for me:
|
@benagricola if you notice the trace log, there are no "put object part" requests hence mp.get_all_parts() returns 0 entries. Can you give me instructions on what commands to run to see duplicity error? (I have not used duplicity) |
@krishnasrinivas replace in your script: |
Hi @krishnasrinivas, Correct - there is no PUT object parts because Duplicity uses So the process is:
But because list_multipart_uploads returns nothing, each worker simply returns and says the chunk has been uploaded successfully even though nothing has happened. You can see the multipart boto-based code in duplicity with the relevant worker code here: https://github.com/henrysher/duplicity/blob/master/duplicity/backends/_boto_multi.py#L203 |
oops sorry about that. Yes I see you try to do list_multipart_uploads @krishnasrinivas oh, I see, a bit complicated but Minio rocks anyway :) ok great 👍 |
@benagricola we made a change to simplify our backend format and also the code. As a result we will no longer be able to support ListMultipartUploads when prefix is given as Reason for ListMultipartUploads to exist in AWS-S3 is for clients to list multipart uploads and remove them so that AWS do not bill for them. In our case we auto purge old multipart uploads and not support ListMultipartUploads. In duplicity if you see:
there is no advantage of doing list_multipart_uploads() to check for the id, because even if you did not do, the subsequent putObjectPart() would have failed if the The right way for the application to behave is:
@benagricola because of this behavior by duplicity it won't work with minio, have you been using old minio version for your deployment? |
@krishnasrinivas yep, have been using duplicity with an older version of minio but recently upgraded. |
@benagricola github.com/minio/mc now supports encryption https://docs.minio.io/docs/minio-client-complete-guide. https://github.com/restic/restic is another alternative. |
@krishnasrinivas should we close this? |
Closing this issue as we are not going to bring back the older behavior of ListMultipartUploads for now. |
@krishnasrinivas @harshavardhana you mentioned in a previous post that minio will auto-purge old multipart uploads. I looked through some of the documentation but did not see a reference to this behavior. How often are they purged? Where I work, we are using minio for some internal S3-compatible testing, and it would have made things much easier if listMultipartUploads with a less restrictive prefix (e.g. "folder1" instead of "folder1/file") were supported. We are uploading multi-terabyte files, and when simulating failures, the .minio.sys directory fills up the disk quite quickly. Additionally, we found that when uploading a 5 TB file, minio requires twice the file size in free space to concatenate the file while it is being uploaded. Not really a problem, but again, I did not see that in the documentation. |
@meinemitternacht old multipart uploads older than 2 weeks get purged.
Can you give more info on what kind of failures are being simulated? On any failure we cleanup the tmp files, hence .minio.sys directory should not get filled up. |
We are using libs3 to upload files by piping data to the program. If the upload is aborted for some reason (or a there is a network connectivity issue), libs3 does not issue an "abortMultipartUpload" API call. This is not the fault of minio, as the client should be issuing that command. It is just rather annoying for us to have to perform "listMultipartUploads" for each key that was uploaded in order to delete the applicable temporary files. One call with a common prefix would be much more efficient. I certainly don't want to persuade you to change this behavior, just wanted to convey our particular use case. Though, seeing as how other S3-compatible providers do support listing by a prefix, this could positively improve compatibility across the board. Minio is just one provider that we were testing, so it isn't critical that these issues are addressed. |
you can listMultipartUploads if you know for which object it failed, we simply don't support hierarchical listing. Historically we used to support all of this in various combinations but we simply moved for a more simpler bug-free backend implementation which disallowed lesser used features. If you look at aws-sdk upload managers they abort by default upon error, so libs3 should do the same here, why not use aws-sdk-c++ here? |
That's perfectly fine, I was mainly curious about how long it would take for temp files to be deleted. And it seems that we should be handling the deletion of those files ourselves since they could potentially be up to ~10 TB per object uploaded (counting the temporary concatenation file). Having that data hang around for two weeks is wasteful.
Indeed, I agree that behavior should be present in libs3. We looked at using aws-sdk-c++, but it was much simpler to adapt libs3 for our environment (embedded device, controlling program written in C) than it was to integrate the necessary utilities for building that SDK. In the future we may be able to integrate it. |
That may be true but we are sort of expecting that you don't have lots of timeouts when uploading large objects, but again if you know object and key then we do list the upload ids and you can forcibly abort them. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Hello, i'm trying to list multipart uploads with python boto library (boto==2.48.0) and all time getting response without multipart uploads while in storage directory of minio i can see my not completed uploads (.minio.sys/multipart/test_bucket/file1// there are fs.json, object1, object2 as I uploaded two parts), the same behavior with postman.
Expected Behavior
https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadListMPUpload.html
Upload with file1 key
Current Behavior
No Uploads tag at all
Steps to Reproduce (for bugs)
No complete multipart was called.
Boto example:
Context
I've got integration tests which fails when run them on minio. Probably, can be also an issue for different s3 browsers.
Your Environment
minio version
): minio.RELEASE.2018-01-18T20-33-21Zuname -a
): Darwin data_race 17.4.0 Darwin Kernel Version 17.4.0: Sun Dec 17 09:19:54 PST 2017; root:xnu-4570.41.2~1/RELEASE_X86_64 x86_64The text was updated successfully, but these errors were encountered: