Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel uploads for multipart #1020

Open
smallhive opened this issue Nov 2, 2024 · 4 comments
Open

Parallel uploads for multipart #1020

smallhive opened this issue Nov 2, 2024 · 4 comments
Assignees
Labels
bug Something isn't working I3 Minimal impact S2 Regular significance U2 Seriously planned
Milestone

Comments

@smallhive
Copy link
Contributor

Multipart uploads don't work in general case

Current Behavior

AWS SDK uploads parts for multipart in 5 parallel threads. The gate expects parts subsequently one by one

Expected Behavior

All parts should be uploaded in any order

Possible Solution

Collect the final object hash in a different way

Steps to Reproduce

  • Create multipart upload with any available tool
  • Upload parts in not subsequent order and get OperationAborted: 409 error

Context

Related to #1016

Your Environment

@smallhive smallhive added bug Something isn't working U2 Seriously planned labels Nov 2, 2024
@roman-khimov roman-khimov added S2 Regular significance I3 Minimal impact labels Nov 2, 2024
@roman-khimov roman-khimov added this to the v0.32.1 milestone Nov 2, 2024
@roman-khimov
Copy link
Member

The problem is not the hash, really. The problem is split chain itself and v1/v2 are the same here, pre-#957 code didn't support this as well. Chaining is this previous reference we have, it's designed for streams and it's really good for this use case, but S3 multipart is not about streaming one part after another, it treats multipart upload as a number of independent slots and, most importantly, real applications do use this property.

Potential solutions:

  • upload to temporary objects, reassemble at CompleteMultipart
  • change split scheme to support "slots" and (optionally) not rely on chain

Temporary objects require additional logic on S3 side and can leave garbage that is harder to trace (additional attributes?). They can be optional (we can try pushing the next split chunk if possible and resort to additional objects if the part is out of sequence). And they will seriously affect multipart completion, it will require quite some time to reslice everything (hashing alone would be much easier, but that's not the problem we have).

Supporting "slots" can be some additional "part number" attribute that is used instead of "previous". It completely breaks backward walking assembly logic and makes link objects more important, but it's still a possibility and we still can find all related objects this way. It can also simplify part reuploading. At the same time it's a protocol change. Can this be useful for standalone NeoFS? Not sure.

@carpawell?

@carpawell
Copy link
Member

From the NeoFS side, I see some questions, and the main one is if we can solve them successfully, why have we needed this backward-chained logic from the beginning for so long if we can accept a simpler scheme (but based on some agreements that should be taken as truth)?

  1. "makes link objects more important": who builds this object? It seems like uploading parts to different nodes is an obvious application, so who completes the link object and when?
  2. "It can also simplify part reuploading.": how do we "update" a link object then? Also reupload it? So it should be dynamic then and the chain objects should not fix their relation to a link object?
  3. "it treats multipart upload as a number of independent slots": how real is this case? Is it so much required to change storage nodes but not gateways?

Don't mind considering protocol changes but for now to me it is more like trying to play against NeoFS and figuring out some kludges about it.

@roman-khimov
Copy link
Member

Chained objects are more robust and they're very good for streams of data. Typical NeoFS slicing pattern is exactly that, you know previous object hash, you know all the hashes, you can build these links and indexes effectively and you can always follow the chain exactly.

Slot-alike structure is more fragile, it's not simpler, without an index object it requires searches to find other parts. Also, regarding its use for S3 one thing to keep in mind is that probably we can't ensure 1:1 slot mapping between NeoFS and S3, since parts there are 5 MB to 5 GB and 5GB is a big (split) object in NeoFS. Split hierarchies is something we've long tried to avoid and I'd still try to so.

Unfortunately, looks like this limits us to some S3-specific scheme with regular objects that are then reassembled upon upload completion. Which totally destroys the optimization we have now (almost free multipart upload completion). I'm all ears for other ideas.

@roman-khimov
Copy link
Member

The proposal here is:

  • start the upload the same way as before
  • push successive objects as they're now if possible (sequential parts), recommend this mode of operation for users
  • if there is an attempt to upload non-consecutive part, create a new object that is not a part of the split
  • it must not have any other attributes but "S3MultipartUpload=upload_id" and "S3MultipartNumber=X"
  • return this object ID in ETag of part upload reply (remember, "This ETag is not necessarily an MD5 hash of the object data")
  • figure out everything else during CompleteMultipart, you'd get part number and ETag here, you know which parts are properly sliced already, if there were no concurrent part uploads --- finish the object the way it's finished now, if there are some parts --- read them and append to object (delete afterwards), complete the split

Intermediate objects can be found via attributes if we need to respond to ListParts. We can't easily expire them unfortunately, the default S3 behavior is to keep multipart open for as long as needed even though practically they recommend lifecycle policies. We will try minimizing reslicing overhead as much as possible, but at the same time make it possible to use S3 multiparts the way they were designed.

@smallhive smallhive self-assigned this Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working I3 Minimal impact S2 Regular significance U2 Seriously planned
Projects
None yet
Development

No branches or pull requests

3 participants