zstd: Reuse single encoder/decoder with many dictionaries #953

coxley · 2024-04-18T16:17:16Z

coxley
Apr 18, 2024

Problem

I have a multi-tenanted use-case where we will keep hundreds (~500-1000) dictionaries in-memory at any given time. These are used to compress data before writing to storage, gRPC, etc, and decompressed on the way back.

The current API assumes that you know the entire set of dictionaries you'll use at setup time. There's no way to give a dictionary to w.EncodeAll nor is there a way to do it for r.DecodeAll or even register dictionaries to an existing *zstd.Reader.

Ideally, I manage the lifetime of my own dictionaries. When to refresh, prefetch, locally cache, etc. At compression and decompression time, I can handle providing the correct []byte to use. Building a dictionary with WithEncoderDict or WithDecoderDicts seems to copy all of the data given to it, making additional overhead.

Upstream zstd has a function ZSTD_createCDict_byReference which avoids copying the input. Doing something similar would be a very nice addition as well.

Any thoughts?

klauspost · 2024-04-19T08:06:02Z

klauspost
Apr 19, 2024
Maintainer

I could see it be feasible for EncodeAll/DecodeAll - but an amount of plumbing will be needed.

0 replies

klauspost · 2024-04-19T14:45:16Z

klauspost
Apr 19, 2024
Maintainer

btw, did you notice you can add several dicts to a decoder: https://pkg.go.dev/github.com/klauspost/compress/zstd#WithDecoderDicts

3 replies

coxley May 26, 2024
Author

I did — but dictionary rotation happens non-uniformly at runtime.

For example, if there are 1000 dictionaries loaded on a hot system, one of them had a new dictionary uploaded centrally and is propagating through, now we have to swap it out / add it in.

Or another example, we are successfully decompressing data in the hot-path but all of a sudden need to read some older compressed blobs from storage.

coxley May 28, 2024
Author

The crux of it is that the current library works well for a relatively static set of dictionaries whereas we're doing continuous training in a bunch of areas.

dangermike Jul 11, 2024

This would be a useful feature for long-running event handling workers that might want to use different dictionaries for different event types (or tenants in a multitenant system)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zstd: Reuse single encoder/decoder with many dictionaries #953

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

zstd: Reuse single encoder/decoder with many dictionaries #953

coxley Apr 18, 2024

Problem

Replies: 2 comments · 3 replies

klauspost Apr 19, 2024 Maintainer

klauspost Apr 19, 2024 Maintainer

coxley May 26, 2024 Author

coxley May 28, 2024 Author

dangermike Jul 11, 2024

coxley
Apr 18, 2024

Replies: 2 comments 3 replies

klauspost
Apr 19, 2024
Maintainer

klauspost
Apr 19, 2024
Maintainer

coxley May 26, 2024
Author

coxley May 28, 2024
Author