Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrapped ideas #9

Open
lifthrasiir opened this issue Aug 25, 2021 · 11 comments
Open

Scrapped ideas #9

lifthrasiir opened this issue Aug 25, 2021 · 11 comments
Labels

Comments

@lifthrasiir
Copy link
Owner

This issue lists some wild ideas I had in mind but scrapped for various reasons. Maybe some of them might be useful in other projects.

@lifthrasiir
Copy link
Owner Author

Combine predictions and counts if possible

For many inputs the optimal precision is around 12, which means that 3--4 bits are wasted per context. And it seems that the optimal modelMaxCount is also very low, typically between 4 and 10. So it might be possible to combine that for a lower memory usage. But the current default of 150 MB memory means that additional contextBits can only improve some tens of bytes, and that didn't justify the decoder complexity (I think it is even, because we definitely won't have two new UintXXArray() calls).

@lifthrasiir
Copy link
Owner Author

lifthrasiir commented Aug 25, 2021

Use XOR instead of addition

The current decoder contains the following fragment:

(y=y*997+(o[l-C]|0)|0)

This can be replaced with (y=y*997^o[l-C]) if we use XOR instead of addition in the context hash. But the addition seems to be better for hashing than XOR when tested (100B+ differences for samples), so I gave up with it.

@lifthrasiir
Copy link
Owner Author

Pre-XOR transform

There are only two places where the output is directly read:

  • The context hash (y=y*997*(o[l-C]|0)|0)
  • The output update o[l++]=a-=K where K = 2inBits

If we XORed the entire input with a predetermined value X, both remain as compact as before:

  • (y=y*997*(o[l-C]^X)|0)
  • o[l++]=a^=Y where Y = 2inBits XOR X

Unfortunately X had a minimal impact (<10B) on the compression at least for JS inputs.

@lifthrasiir
Copy link
Owner Author

Use WOFF2 as a Brotli container

The current Roadroller output is competitive with Brotli even with the decoder included, but Brotli still works better in some inputs (although only marginally). So making use of Brotli might be an alternative.

We already have used PNG bootstrap for its built-in DEFLATE compression, so a natural (and possibly only) consequence is to use WOFF2 for its built-in Brotli compression. The raw data would be embedded as a bitmap data and the decoder would write glyphs to a canvas to extract them.

I currently have no inputs where Brotli is significantly (>10%) better than Roadroller so I didn't pursue it further, but it still remains as a possibility.

@lifthrasiir
Copy link
Owner Author

Other models

So far I have implemented (probably incorrectly) following models as experiments:

Maybe I should also investigate Prediction by Partial Matching and Direct Markov compression which are known to perform very well under some parameters.

@lifthrasiir
Copy link
Owner Author

Alternative hash table representation

The current Roadroller model uses a size-limited hash table like most other context model implementations, but this is not strictly necessary. Anything that can map the context (or its hash) to a probability can be used, including a literal Map. It is just that the mapping is allowed to be slightly inaccurate as long as it remains deterministic.

It might seem that more accurate mapping will improve the compression more, but that's not strictly true. If you have 100,000 bits of input the mapping is updated only 100,000 times per model, which is far less than the available memory (33,000,000 in the default setting). Many entries are only used just once, not contributing to the compression in any way. The right amount of hash collision is better than no hash collision for that reason; a prediction for the collided hash may happen to be better than the baseline prediction.

Technically this accuracy problem can be sidestepped by truncating the context hash, so you might still be able to use other data structure to save memory. I've actually tried to use a plain Map, but it was too slow to be useful (about 5--10 times slower). Therefore it was deemed not worthy regardless of the accuracy issue.

@lifthrasiir
Copy link
Owner Author

lifthrasiir commented Aug 25, 2021

Shared hash table

In the opposite direction, we may use the same shared hash table for all models and see what happens. This reduces the decoder complexity a bit and had a potential to perform equally well with the same amount of memory, but again, the compression was negatively affected enough (>200B) that I had to scrap this.

@lifthrasiir
Copy link
Owner Author

Stealing Brotli static dictionary

Brotli ships with a notably large preset dictionary of about 120 KB and it certainly contributes to its performance. If we have an access to Brotli in web browsers, it is definitely possible to extract that dictionary with a synthesized input and make use of it. Too bad that we can't do that with a small input (the distance-length code can only access individual words in the dictionary)

As an experiment I've directly inserted the dictionary to the preset and measured the performance. Unfortunately this didn't perform well for every JS input I've tested. It worked for purely text samples (e.g. README compressed size went from 5965B to 5738B in -O0), but still less so than I hoped anyway. This idea still remains as an interesting possibility for size-limited demos though (you've got a large word list for free!).

@lifthrasiir
Copy link
Owner Author

lifthrasiir commented Sep 6, 2021

DEFLATE recompression

Roadroller algorithm is known to be relatively weak at deduplication, which is a crux of LZ77. It can handle duplicated strings since each occurrence of such a string makes the model prefer that string, but it would adapt slower. Also it doesn't really work for images, since the input can be much larger and it would directly impact the decompression speed.

Just in case, I've considered if DEFLATE can be recompressed via Roadroller-like algorithm:

  • We can bypass DEFLATE's Huffman coding by using the static Huffman tree, which is relatively easy to reconstruct in a compact code.
  • The number of extra bits after literal/length codes or distance codes can be inferred from the code itself. For the simplicity literal/length codes and distance codes should be merged into a single symbol set, so that the decoder doesn't have to maintain a state to calculate the number of extra bits.
  • The extra bits can be either encoded separately or interleaved to symbols.
    • Separate extra bits wouldn't be compressed, which seems to be significant (~50% of the data) in sample images.
    • Interleaved extra bits can be also problematic, because ~13 extra bits are allowed so the symbol bit length should be also at least 13, which is a waste. This can be mitigated by treating a portion of extra bits of the code: if, for example, distance code 29 is coded as 1101 and followed by 13 extra bits 1011011010111, it can be reinterpreted as the code 110111101 and extra bits 10110110 instead. (Since the Huffman code and extra bits are read in the inverse direction, the extra bits would be reversed when treated as a Huffman code.) Therefore it is possible to stuff everything into the 9 bit symbol set.

I had no high hope for this, but this didn't work well anyway: the resulting recompressed data was slightly larger than the original DEFLATE stream. In the other words Roadroller algorithm is quite bad at stationary input (which is expected, because it was tuned for non-stationary input). There was a single case that it worked well: DEFLATE length code is capped at 258 so a large strip of that length codes was better modelled with Roadroller, but in that case the WebP lossless mode would work much better (WebP is better at most small images after all).

@lifthrasiir
Copy link
Owner Author

lifthrasiir commented Sep 6, 2021

Input reversal

In the current decoder it is easy to reconstruct the decompressed data in the reverse direction without any additional code, so we can possibly reverse the input. Which one compresses better would depend on the input; there definitely exists a set of inputs where the backward context is richer than the forward context. Unfortunately the JS input was not one of them, but I might revisit this if there is a need for such inputs.

@lifthrasiir
Copy link
Owner Author

lifthrasiir commented Sep 14, 2021

Use different hash multipliers

The current hash multiplier of 997 is the largest 3-digit prime number and chosen so that other contexts can be combined with smaller multipliers without causing much problem. But the optimal multiplier can of course vary with inputs. It is actually very easy to add a hash multiplier parameter to the current context model (in addition to the aforementioned XOR instead of addition), but no significant improvement was found. In addition the size function with respect to the multiplier is not even remotly convex unlike other parameters, so can't be easily optimized without brute forcing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant