-
Notifications
You must be signed in to change notification settings - Fork 501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Horizon Lite: Come up with better index and meta compression scheme #4497
Comments
Per the discussion thread re: dictionary churn, maybe we don't need to train it more than once (or at most occasionally). One training session on a block of history (or less? idk) would be representative of "account activity" which is what indices represent. As a separate idea, maybe we can fork + modify roaring bitmaps (or sroar) to add the " |
|
Stored ledger metadata and more so indexes are occupying a lot of space:
The full metadata files occupies ~8TB for which we don't use a compression scheme.
A preliminary test by @Shaptic of indices built across 100 checkpoints (6400 ledgers) tells us the following:
.tar.gz
file, the size is reduced by ~44%. Note that this is different than compressing individual indices (which we already do)Extrapolating this to a year of history (which comes with some big assumptions, like linear growth of indices with history) gives us ~1TB of raw indexes.
(Details and caveats are captured in this Slack thread. We can update this once a larger build is complete.)
@2opremio predicts we may be able to do much better with zstd, using a common index for all files: https://github.com/facebook/zstd#the-case-for-small-data-compression. This will allow us to:
On the other hand, we would need to book-keep a separate compression dictionary which requires re-generating the files whenever we update it.
The text was updated successfully, but these errors were encountered: