Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support zstd for GPU shuffle compression #10824

Merged
merged 3 commits into from
May 20, 2024

Conversation

firestarman
Copy link
Collaborator

@firestarman firestarman commented May 16, 2024

contribute to #10790

This PR is to add the ZSTD codec for GPU shuffle compression.

@firestarman
Copy link
Collaborator Author

firestarman commented May 16, 2024

Premerge is pending, because the JNI jar including the ZSTD support is not ready in the repo yet.
The JNI jar is ready.

Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This says it fixes #10790, but I don't see how adding support for a new compression codec on the GPU means we now have support for Celeborn. Can you elaborate? How does compression on the GPU avoid redundant compression by Celeborn? Do docs need to be updated on how to configure this new functionality when using Celeborn?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the comment at the top of this file states, this is automatically generated by flatbuffers, yet this PR doesn't have any changes to flatbuffer schemas. This file needs to be generated from the updated flatbuffer schema, not manually edited.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for the info. Updated

.internal()
.startupOnly()
.stringConf
.createWithDefault("none")

val SHUFFLE_COMPRESSION_LZ4_CHUNK_SIZE = conf("spark.rapids.shuffle.compression.lz4.chunkSize")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to combine these? If we do, we cannot create separate defaults between LZ4 and ZSTD if it's determined different values are better defaults for one codec vs another. Similarly, consider if we supported a compression level. Compression level 4 might mean something very different for one codec vs. another, or not apply at all despite the config implying it does.

I'm not sure myself which way to go on this, just musing the potential confusion in the future if we try to combine.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you are right. Better to use separate configs for LZ4 and ZSTD. Updated

Signed-off-by: Firestarman <[email protected]>
@firestarman
Copy link
Collaborator Author

This says it fixes #10790, but I don't see how adding support for a new compression codec on the GPU means we now have support for Celeborn. Can you elaborate? How does compression on the GPU avoid redundant compression by Celeborn? Do docs need to be updated on how to configure this new functionality when using Celeborn?

Sorry for the confustion. I changed the description. And this PR is only a contribution to that issue for the ZSTD codec part.

Actually we do not need to do anything special to support Celeborn. Celeborn acts as a normal Shuffle manger and Plugin always works well with it.
We just want to enable GPU partitioning, compression/decompression and slicing for the normal Shuffle path to see if we can get any perf benefit when working with Celeborn.

The goal of enabling GPU compression is not to avoid Celeborn compression, but to reduce the workload of copying data between the host and the device during Shuffle serialization and deserialization.

@firestarman
Copy link
Collaborator Author

build

Signed-off-by: Firestarman <[email protected]>
@firestarman
Copy link
Collaborator Author

build

@firestarman firestarman requested a review from jlowe May 17, 2024 04:44
@jlowe
Copy link
Member

jlowe commented May 17, 2024

Actually we do not need to do anything special to support Celeborn. Celeborn acts as a normal Shuffle manger and Plugin always works well with it.

That's what I thought, but yet #10790 implies there's work to do that's Celeborn-specific. Can someone clarify that issue what needs to be done for Celeborn support or just close it as unplanned if there's actually nothing to do there?

The goal of enabling GPU compression is not to avoid Celeborn compression, but to reduce the workload of copying data between the host and the device during Shuffle serialization and deserialization.

Normally it takes the GPU longer to compress or decompress the data than it takes to copy it across the PCI bus, so trying to use a compression codec is always a loss from the perspective of saving data copying time. The only case where that might not be the case is when the copy is to pageable instead of pinned memory, but then we're burning a lot of GPU cycles when it would be better to just allocate more pinned memory (if possible). What compression/decompression rates are you seeing, and how does that compare to the copy times? Any nsys traces or similar you can share?

@firestarman
Copy link
Collaborator Author

firestarman commented May 20, 2024

Can someone clarify that issue what needs to be done for Celeborn support or just close it as unplanned if there's actually nothing to do there?

@winningsix will help update it.

What compression/decompression rates are you seeing, and how does that compare to the copy times? Any nsys traces or similar you can share?

I don't have such metrics yet, but an e2e comaprison of a customer query, which shows a 2x speedup by zstd+gpu serde.
I would be happy to collect some nsys if you want to see more.

@firestarman firestarman merged commit 1d63a49 into NVIDIA:branch-24.06 May 20, 2024
43 of 44 checks passed
@firestarman firestarman deleted the comp-zstd branch May 20, 2024 01:50
@sameerz sameerz added the task Work required that improves the product but is not user facing label May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
task Work required that improves the product but is not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants