-
-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decompression could be optimized (would be 5x to 20x faster) #9
Comments
feel free to send a PR |
I did some preliminary investigation; this is a bit trickier for the general case than for the GIF case (because a code may refer to a position that is no longer in the current buffer whereas for a GIF image you can keep the whole image in memory) but I think it can be done. |
I've investigated this question. I am not able to read this idea in
We are able to find previous code for current code, we are able to find last symbol for current code. We can see that this solution targets the least possible amount of memory. It can be named as "reverted trie on unidirectional linked list". And it requires you to write data in opposite direction, of course. It requires only If we want to write output in same direction we have to implement "trie on bidirectional linked list" ot "trie on bidirectional sparse array" or "bidirectional hash map". It is possible but I don't think that it will speed up decompression. I think that penalty of maintaining any bidirectional structure will be so much more than some buffer remapping. |
I don't think it's doable because the whole decoded buffer isn't kept in memory. #27 has a decoder that basically stores each string in a flat array in their full form. This is doable. But my problem with this is that the worst case memory usage is unknown. It has to be bounded since the code is bounded to 16bits but I'm not sure what that bound is and it may turn out to be quite high. Here's mixed approach that uses bounded memory but is faster than the current dict where strings are represented as With |
I did a bit of math-ing today and TLDR is that the worst case is roughly 2 gigs. Longer explanation: worst case would always make sure the new code is larger than the previous. And the only way to do that is to repeat the last code as the prefix + a new byte. In other words, a large run of the same byte will do the trick. So what we have here is basically an arithmetic progression where each term increases by 1. Using the sum formula we get: I also tested out the theory and the result came out as expected, just short of 2GiB: $ </dev/zero head -c 4000000000 | ./compress -C | ./compress -d >/dev/null
Dict str usage: 1.984 GiB So yeah, the strategy presented in #27 is not really usable unless you want the decompression worst case memory usage to go up to 2Gigs. |
Hi,
LZW decompression could be optimized to be somewhat faster by not using the classic build-a-stack-then-reverse-it technique but rather using references in the read buffer.
I have implemented this in a GIF reader a while ago. If that's of interest I could try working on a patch.
Cheers,
Robert.
The text was updated successfully, but these errors were encountered: