-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scrapped ideas #9
Comments
Combine predictions and counts if possibleFor many inputs the optimal precision is around 12, which means that 3--4 bits are wasted per context. And it seems that the optimal modelMaxCount is also very low, typically between 4 and 10. So it might be possible to combine that for a lower memory usage. But the current default of 150 MB memory means that additional contextBits can only improve some tens of bytes, and that didn't justify the decoder complexity (I think it is even, because we definitely won't have two |
Use XOR instead of additionThe current decoder contains the following fragment: (y=y*997+(o[l-C]|0)|0) This can be replaced with |
Pre-XOR transformThere are only two places where the output is directly read:
If we XORed the entire input with a predetermined value X, both remain as compact as before:
Unfortunately X had a minimal impact (<10B) on the compression at least for JS inputs. |
Use WOFF2 as a Brotli containerThe current Roadroller output is competitive with Brotli even with the decoder included, but Brotli still works better in some inputs (although only marginally). So making use of Brotli might be an alternative. We already have used PNG bootstrap for its built-in DEFLATE compression, so a natural (and possibly only) consequence is to use WOFF2 for its built-in Brotli compression. The raw data would be embedded as a bitmap data and the decoder would write glyphs to a canvas to extract them. I currently have no inputs where Brotli is significantly (>10%) better than Roadroller so I didn't pursue it further, but it still remains as a possibility. |
Other modelsSo far I have implemented (probably incorrectly) following models as experiments:
Maybe I should also investigate Prediction by Partial Matching and Direct Markov compression which are known to perform very well under some parameters. |
Alternative hash table representationThe current Roadroller model uses a size-limited hash table like most other context model implementations, but this is not strictly necessary. Anything that can map the context (or its hash) to a probability can be used, including a literal It might seem that more accurate mapping will improve the compression more, but that's not strictly true. If you have 100,000 bits of input the mapping is updated only 100,000 times per model, which is far less than the available memory (33,000,000 in the default setting). Many entries are only used just once, not contributing to the compression in any way. The right amount of hash collision is better than no hash collision for that reason; a prediction for the collided hash may happen to be better than the baseline prediction. Technically this accuracy problem can be sidestepped by truncating the context hash, so you might still be able to use other data structure to save memory. I've actually tried to use a plain |
Shared hash tableIn the opposite direction, we may use the same shared hash table for all models and see what happens. This reduces the decoder complexity a bit and had a potential to perform equally well with the same amount of memory, but again, the compression was negatively affected enough (>200B) that I had to scrap this. |
Stealing Brotli static dictionaryBrotli ships with a notably large preset dictionary of about 120 KB and it certainly contributes to its performance. If we have an access to Brotli in web browsers, it is definitely possible to extract that dictionary with a synthesized input and make use of it. Too bad that we can't do that with a small input (the distance-length code can only access individual words in the dictionary) As an experiment I've directly inserted the dictionary to the preset and measured the performance. Unfortunately this didn't perform well for every JS input I've tested. It worked for purely text samples (e.g. README compressed size went from 5965B to 5738B in |
DEFLATE recompressionRoadroller algorithm is known to be relatively weak at deduplication, which is a crux of LZ77. It can handle duplicated strings since each occurrence of such a string makes the model prefer that string, but it would adapt slower. Also it doesn't really work for images, since the input can be much larger and it would directly impact the decompression speed. Just in case, I've considered if DEFLATE can be recompressed via Roadroller-like algorithm:
I had no high hope for this, but this didn't work well anyway: the resulting recompressed data was slightly larger than the original DEFLATE stream. In the other words Roadroller algorithm is quite bad at stationary input (which is expected, because it was tuned for non-stationary input). There was a single case that it worked well: DEFLATE length code is capped at 258 so a large strip of that length codes was better modelled with Roadroller, but in that case the WebP lossless mode would work much better (WebP is better at most small images after all). |
Input reversalIn the current decoder it is easy to reconstruct the decompressed data in the reverse direction without any additional code, so we can possibly reverse the input. Which one compresses better would depend on the input; there definitely exists a set of inputs where the backward context is richer than the forward context. Unfortunately the JS input was not one of them, but I might revisit this if there is a need for such inputs. |
Use different hash multipliersThe current hash multiplier of 997 is the largest 3-digit prime number and chosen so that other contexts can be combined with smaller multipliers without causing much problem. But the optimal multiplier can of course vary with inputs. It is actually very easy to add a hash multiplier parameter to the current context model (in addition to the aforementioned XOR instead of addition), but no significant improvement was found. In addition the size function with respect to the multiplier is not even remotly convex unlike other parameters, so can't be easily optimized without brute forcing. |
This issue lists some wild ideas I had in mind but scrapped for various reasons. Maybe some of them might be useful in other projects.
The text was updated successfully, but these errors were encountered: