Parallel uploads for multipart #1020

smallhive · 2024-11-02T09:01:10Z

Multipart uploads don't work in general case

Current Behavior

AWS SDK uploads parts for multipart in 5 parallel threads. The gate expects parts subsequently one by one

Expected Behavior

All parts should be uploaded in any order

Possible Solution

Collect the final object hash in a different way

Steps to Reproduce

Create multipart upload with any available tool
Upload parts in not subsequent order and get OperationAborted: 409 error

Context

Related to #1016

Your Environment

Version of the product used: 59db6be2c65b89aaa94047d98cb1fb38bc4393c3

The text was updated successfully, but these errors were encountered:

roman-khimov · 2024-11-06T08:21:51Z

The problem is not the hash, really. The problem is split chain itself and v1/v2 are the same here, pre-#957 code didn't support this as well. Chaining is this previous reference we have, it's designed for streams and it's really good for this use case, but S3 multipart is not about streaming one part after another, it treats multipart upload as a number of independent slots and, most importantly, real applications do use this property.

Potential solutions:

upload to temporary objects, reassemble at CompleteMultipart
change split scheme to support "slots" and (optionally) not rely on chain

Temporary objects require additional logic on S3 side and can leave garbage that is harder to trace (additional attributes?). They can be optional (we can try pushing the next split chunk if possible and resort to additional objects if the part is out of sequence). And they will seriously affect multipart completion, it will require quite some time to reslice everything (hashing alone would be much easier, but that's not the problem we have).

Supporting "slots" can be some additional "part number" attribute that is used instead of "previous". It completely breaks backward walking assembly logic and makes link objects more important, but it's still a possibility and we still can find all related objects this way. It can also simplify part reuploading. At the same time it's a protocol change. Can this be useful for standalone NeoFS? Not sure.

@carpawell?

carpawell · 2024-11-06T21:34:16Z

From the NeoFS side, I see some questions, and the main one is if we can solve them successfully, why have we needed this backward-chained logic from the beginning for so long if we can accept a simpler scheme (but based on some agreements that should be taken as truth)?

"makes link objects more important": who builds this object? It seems like uploading parts to different nodes is an obvious application, so who completes the link object and when?
"It can also simplify part reuploading.": how do we "update" a link object then? Also reupload it? So it should be dynamic then and the chain objects should not fix their relation to a link object?
"it treats multipart upload as a number of independent slots": how real is this case? Is it so much required to change storage nodes but not gateways?

Don't mind considering protocol changes but for now to me it is more like trying to play against NeoFS and figuring out some kludges about it.

roman-khimov · 2024-11-11T12:15:27Z

Chained objects are more robust and they're very good for streams of data. Typical NeoFS slicing pattern is exactly that, you know previous object hash, you know all the hashes, you can build these links and indexes effectively and you can always follow the chain exactly.

Slot-alike structure is more fragile, it's not simpler, without an index object it requires searches to find other parts. Also, regarding its use for S3 one thing to keep in mind is that probably we can't ensure 1:1 slot mapping between NeoFS and S3, since parts there are 5 MB to 5 GB and 5GB is a big (split) object in NeoFS. Split hierarchies is something we've long tried to avoid and I'd still try to so.

Unfortunately, looks like this limits us to some S3-specific scheme with regular objects that are then reassembled upon upload completion. Which totally destroys the optimization we have now (almost free multipart upload completion). I'm all ears for other ideas.

roman-khimov · 2024-11-20T11:50:16Z

The proposal here is:

start the upload the same way as before
push successive objects as they're now if possible (sequential parts), recommend this mode of operation for users
if there is an attempt to upload non-consecutive part, create a new object that is not a part of the split
it must not have any other attributes but "S3MultipartUpload=upload_id" and "S3MultipartNumber=X"
return this object ID in ETag of part upload reply (remember, "This ETag is not necessarily an MD5 hash of the object data")
figure out everything else during CompleteMultipart, you'd get part number and ETag here, you know which parts are properly sliced already, if there were no concurrent part uploads --- finish the object the way it's finished now, if there are some parts --- read them and append to object (delete afterwards), complete the split

Intermediate objects can be found via attributes if we need to respond to ListParts. We can't easily expire them unfortunately, the default S3 behavior is to keep multipart open for as long as needed even though practically they recommend lifecycle policies. We will try minimizing reslicing overhead as much as possible, but at the same time make it possible to use S3 multiparts the way they were designed.

smallhive added bug Something isn't working U2 Seriously planned labels Nov 2, 2024

roman-khimov added S2 Regular significance I3 Minimal impact labels Nov 2, 2024

roman-khimov added this to the v0.32.1 milestone Nov 2, 2024

smallhive self-assigned this Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel uploads for multipart #1020

Parallel uploads for multipart #1020

smallhive commented Nov 2, 2024

roman-khimov commented Nov 6, 2024

carpawell commented Nov 6, 2024

roman-khimov commented Nov 11, 2024

roman-khimov commented Nov 20, 2024

Parallel uploads for multipart #1020

Parallel uploads for multipart #1020

Comments

smallhive commented Nov 2, 2024

Current Behavior

Expected Behavior

Possible Solution

Steps to Reproduce

Context

Your Environment

roman-khimov commented Nov 6, 2024

carpawell commented Nov 6, 2024

roman-khimov commented Nov 11, 2024

roman-khimov commented Nov 20, 2024