-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple calls to Write() has unexpected overheads? #22
Comments
Possibly related? |
Ok, the best thing to do if one cannot ensure good-sized calls to Write, is to use a func main() {
b := &bytes.Buffer{}
for i := 0; i < 500; i++ {
b.Write([]byte("Hello World! "))
}
data1 := b.Bytes()
fmt.Println("data len", len(data1))
// Compress 1
buffer1 := &bytes.Buffer{}
w1 := zstd.NewWriterLevel(buffer1, CompressionLevel)
w1.Write(data1)
w1.Close()
fmt.Println("Buffer1 len", buffer1.Len())
// Compress 2
buffer2 := &bytes.Buffer{}
w2 := zstd.NewWriterLevel(buffer2, CompressionLevel)
bw := bufio.NewWriter(w2) // default buffer size = 4k
// bw := bufio.NewWriterSize(w2, 8192) // buffer size = 8k
for i := 0; i < 500; i++ {
bw.Write([]byte("Hello World! "))
}
bw.Flush()
w2.Close()
fmt.Println("Buffer2 len", buffer2.Len())
} Output: data len 6500
Buffer1 len 33
Buffer2 len 44 It's not so elegant, but ¯\(ツ)/¯ — Hope that helps someone! |
@jimsmart , try gozstd.Writer. It uses another underlying zstd API, which should have lower overhead. |
The zstd bug referenced above (facebook/zstd#206) has been closed. Is this issue still ongoing? If yes, the need to wrap the writer with a buffer shall be documented, this is a pretty subtle usage advice. |
Hi @rgeronimi, I checked again previous result and indeed you'd currently have the same results. (this)
Hope this helps! |
The following limitations for
As I understand, they mean two things:
cc'ing @Cyan4973 for further clarification. |
@Viq111 explanations are correct.
When in doubt, prefer using |
This depends on what the Go wrapper code does. I just checked it and it transmits directly the user-provided buffer as a C pointer. If I understand what @valyala wrote, this could be a critical bug as the zstd C code is expecting this buffer to remain accessible after the function returns. If true, this would have the potential for data corruptions, process crashes, and hard-to-reproduce cases.
|
Reading back at the code, we started implementing the go wrapper at zstd v0.5 which only had the It may actually be also the issue for #39 If anyone could put up a PR for migrating to Otherwise I can also look into it as it seems it could bit a couple of people using the streaming interface |
We don't have the skillset to zoom into that soon. For storage tasks (e.g., blob storage in DB or compressed custom backup) this bug is a showstopper unfortunately. |
Until DataDog#22 is fixed, this library is using zstd in a way that can cause data corruption, as confirmed by zstd maintainer himself. I think this is critical enough that should be mentioned at the top of README and the Go community should be alerted until the bug is fixed
If I make multiple calls to Write, the resulting compressed data stream length is much greater than buffering the data and making a single call to Write.
— Is this expected behaviour? The reason I ask is that if chained with other writers, one cannot make any guarantees as to how they may parcel up their data.
Code (error handling omitted):
Output:
Regards
The text was updated successfully, but these errors were encountered: