-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
erratic compression rate #4236
Comments
This file is synthetic and describes an unlikely scenario that is implausible in real-world cases. This is evident from the compression ratio, which can reach an extraordinary x10000 ratio, far exceeding the expected parameters. It is not surprising that behavior becomes unpredictable in these conditions. The file itself consists largely of the characters If the content were random, we would expect a ratio of x8. However, the significantly better ratio suggests that these seemingly nonsensical sequences are actually repeated, with large segments essentially copied and pasted. With so many matches available, searching becomes very challenging, and finding the "best match" becomes an implausibly costly task. This situation favors different approaches based on probabilistic methods (i.e., random chance), which are more commonly used at lower compression levels. Given that many expectations are defied in this file, it is not surprising that the match-finding algorithms are pushed to their limits, resulting in a scenario where chance plays a disproportionately large role. |
i understand, thank you for the detailed response and explanation. |
One can always find data like that.
Maybe not that extreme, but I didn't make them artificially mid/long term redundant, which would make them 'behave' even more 'erratic'. They also have something in common. They are not real real, only game real data. Anything that would look similar is DNA sequence and though it experiences 'anomalies' it's nowhere near to that.
Your case is extreme, as explained by @Cyan4973. |
Describe the bug
For this specific file
tm29.zip (zipped, deflate)
, the compression rate, compression speed and decompression speed all erratically change for levels 1-22.
To Reproduce
Steps to reproduce the behavior:
tm29
:zstd tm29 -f -b1 -e22 -i5
Expected behavior
Compression rate, compression speed and compression rate should've (approximately) gone from highest to lowest according to the level used (-1 through -22)
Observed behavior
Compression rate: 5 (huge regression), 11 (huge improvement), 13 (regression), 21 (improvement)
Compression speed: 5 (huge regression), 11 (improvement), 13 (regression)
Decompression speed: 5 (regression), 10 (improvement), 11 (improvement)
Desktop (please complete the following information):
Zstd version:
Relevant information:
The given file tm29 is a compressor benchmark file with hugely repetitive
ab
patterns and does not relate to real data.The text was updated successfully, but these errors were encountered: