Introduce the zstd codec for native spill #656

zuston · 2024-11-24T13:44:00Z

Is your feature request related to a problem? Please describe.

In current codebase, the lz4 codec is used in the spill. zstd should be supported.

Additional context

I will do this if no rejection from project owner.

richox · 2024-11-25T09:04:55Z

i suggest using a property other than spark.io.compression.codec since it is used in broadcast/shuffle where data goes through the network. for local spilling we would like to use a lightweight compression algorithm like lz4/snappy.
i prefer a property like blaze.spill.compression.codec, what do you think?

richox · 2024-11-25T09:07:30Z

and have you done some benchmark using zstd spilling? it will get worse performance than lz4/snappy, if i don't understand wrong.

zuston · 2024-11-25T09:14:41Z

i suggest using a property other than spark.io.compression.codec since it is used in broadcast/shuffle where data goes through the network. for local spilling we would like to use a lightweight compression algorithm like lz4/snappy. i prefer a property like blaze.spill.compression.codec, what do you think?

Another option is acceptable.

and have you done some benchmark using zstd spilling? it will get worse performance than lz4/snappy, if i don't understand wrong.

Haven't. I'm still reading this part code.

zuston · 2024-11-25T09:15:52Z

And I think we still can reuse the IoCompressionReader/Writer . WDYT? @richox

richox · 2024-12-03T03:38:54Z

And I think we still can reuse the IoCompressionReader/Writer . WDYT? @richox

i'm afraid not, some spilled data is not in record batch format, for example AggExec spill row-based grouping keys, while IoCompressionWriter only accepts record batches.

zuston · 2024-12-06T14:48:39Z

From my sight, the IoCompressionWriter will accept the simple bufs

zuston linked a pull request Nov 25, 2024 that will close this issue

feat(spill): Align with the multi IO compression codec in spill #657

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce the zstd codec for native spill #656

Introduce the zstd codec for native spill #656

zuston commented Nov 24, 2024 •

edited

Loading

richox commented Nov 25, 2024

richox commented Nov 25, 2024

zuston commented Nov 25, 2024

zuston commented Nov 25, 2024

richox commented Dec 3, 2024

zuston commented Dec 6, 2024

Introduce the zstd codec for native spill #656

Introduce the zstd codec for native spill #656

Comments

zuston commented Nov 24, 2024 • edited Loading

richox commented Nov 25, 2024

richox commented Nov 25, 2024

zuston commented Nov 25, 2024

zuston commented Nov 25, 2024

richox commented Dec 3, 2024

zuston commented Dec 6, 2024

zuston commented Nov 24, 2024 •

edited

Loading