Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decompression could be optimized (would be 5x to 20x faster) #9

Open
rflauer opened this issue Oct 11, 2016 · 5 comments
Open

Decompression could be optimized (would be 5x to 20x faster) #9

rflauer opened this issue Oct 11, 2016 · 5 comments

Comments

@rflauer
Copy link

rflauer commented Oct 11, 2016

Hi,

LZW decompression could be optimized to be somewhat faster by not using the classic build-a-stack-then-reverse-it technique but rather using references in the read buffer.

I have implemented this in a GIF reader a while ago. If that's of interest I could try working on a patch.

Cheers,

Robert.

@vapier
Copy link
Owner

vapier commented Oct 11, 2016

feel free to send a PR

@rflauer
Copy link
Author

rflauer commented Oct 27, 2016

I did some preliminary investigation; this is a bit trickier for the general case than for the GIF case (because a code may refer to a position that is no longer in the current buffer whereas for a GIF image you can keep the whole image in memory) but I think it can be done.

@andrew-aladev
Copy link
Contributor

andrew-aladev commented Dec 12, 2018

I've investigated this question. I am not able to read this idea in ncompress, it is not readable, but we have LZW AB instead:

prev_code = prefixes[code]
last_symbol = terminators[code]

We are able to find previous code for current code, we are able to find last symbol for current code. We can see that this solution targets the least possible amount of memory. It can be named as "reverted trie on unidirectional linked list". And it requires you to write data in opposite direction, of course.

It requires only ((2 ** n_bits) - 257) * (sizeof(code_t) + 2). 261 KB for 16 bit codes.

If we want to write output in same direction we have to implement "trie on bidirectional linked list" ot "trie on bidirectional sparse array" or "bidirectional hash map". It is possible but I don't think that it will speed up decompression.

I think that penalty of maintaining any bidirectional structure will be so much more than some buffer remapping.

@N-R-K
Copy link

N-R-K commented May 14, 2024

I don't think it's doable because the whole decoded buffer isn't kept in memory.

#27 has a decoder that basically stores each string in a flat array in their full form. This is doable. But my problem with this is that the worst case memory usage is unknown. It has to be bounded since the code is bounded to 16bits but I'm not sure what that bound is and it may turn out to be quite high.


Here's mixed approach that uses bounded memory but is faster than the current dict where strings are represented as <prefix_index, last_char> pair: Use <prefix_index, last_chars[N]> instead. Storing multiple chars per "node" in the linked-list results in less pointer hops and larger "payload" per hop.

With N set to 32, the decompression dict memory usage will be roughly >2MiB while being about 10x faster on large chains. Making this change on ncompress is a bit difficult though, the code is quite... ahem, let's say say "ancient" and difficult to modify.

@N-R-K
Copy link

N-R-K commented May 15, 2024

But my problem with this is that the worst case memory usage is unknown.

I did a bit of math-ing today and TLDR is that the worst case is roughly 2 gigs.


Longer explanation: worst case would always make sure the new code is larger than the previous. And the only way to do that is to repeat the last code as the prefix + a new byte. In other words, a large run of the same byte will do the trick.

So what we have here is basically an arithmetic progression where each term increases by 1. Using the sum formula we get: (2^16 * (1 + 2^16)) / 2 == ~2GiB. (Note that this is slightly larger than the actual worst case due to not accounting the first 255 pre-filled codes).

I also tested out the theory and the result came out as expected, just short of 2GiB:

$ </dev/zero head -c 4000000000 | ./compress -C | ./compress -d >/dev/null
Dict str usage: 1.984 GiB

So yeah, the strategy presented in #27 is not really usable unless you want the decompression worst case memory usage to go up to 2Gigs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants