-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support zstd for GPU shuffle compression #10824
Conversation
Signed-off-by: Firestarman <[email protected]>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This says it fixes #10790, but I don't see how adding support for a new compression codec on the GPU means we now have support for Celeborn. Can you elaborate? How does compression on the GPU avoid redundant compression by Celeborn? Do docs need to be updated on how to configure this new functionality when using Celeborn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the comment at the top of this file states, this is automatically generated by flatbuffers, yet this PR doesn't have any changes to flatbuffer schemas. This file needs to be generated from the updated flatbuffer schema, not manually edited.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx for the info. Updated
.internal() | ||
.startupOnly() | ||
.stringConf | ||
.createWithDefault("none") | ||
|
||
val SHUFFLE_COMPRESSION_LZ4_CHUNK_SIZE = conf("spark.rapids.shuffle.compression.lz4.chunkSize") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to combine these? If we do, we cannot create separate defaults between LZ4 and ZSTD if it's determined different values are better defaults for one codec vs another. Similarly, consider if we supported a compression level. Compression level 4 might mean something very different for one codec vs. another, or not apply at all despite the config implying it does.
I'm not sure myself which way to go on this, just musing the potential confusion in the future if we try to combine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you are right. Better to use separate configs for LZ4 and ZSTD. Updated
Signed-off-by: Firestarman <[email protected]>
Sorry for the confustion. I changed the description. And this PR is only a contribution to that issue for the ZSTD codec part. Actually we do not need to do anything special to support Celeborn. Celeborn acts as a normal Shuffle manger and Plugin always works well with it. The goal of enabling GPU compression is not to avoid Celeborn compression, but to reduce the workload of copying data between the host and the device during Shuffle serialization and deserialization. |
build |
Signed-off-by: Firestarman <[email protected]>
build |
That's what I thought, but yet #10790 implies there's work to do that's Celeborn-specific. Can someone clarify that issue what needs to be done for Celeborn support or just close it as unplanned if there's actually nothing to do there?
Normally it takes the GPU longer to compress or decompress the data than it takes to copy it across the PCI bus, so trying to use a compression codec is always a loss from the perspective of saving data copying time. The only case where that might not be the case is when the copy is to pageable instead of pinned memory, but then we're burning a lot of GPU cycles when it would be better to just allocate more pinned memory (if possible). What compression/decompression rates are you seeing, and how does that compare to the copy times? Any nsys traces or similar you can share? |
@winningsix will help update it.
I don't have such metrics yet, but an e2e comaprison of a customer query, which shows a 2x speedup by zstd+gpu serde. |
contribute to #10790
This PR is to add the ZSTD codec for GPU shuffle compression.