-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace Index File Backfill with Bitmap Indexer Backfill #744
Comments
Leaving Context here regarding #803 before I go on a 2 week Leave. I've migrated the code to use the new production indexer. I reran tests and it all still passes. There's a couple things I have not done explicitly which I'd like to do for manual testing, before releasing to dev.
There's various callouts of unused code by Cargo, which are definitely false. For example, that the new() constructor for CompressedBitmap, or StreamExt. Not sure what to do about those. For next steps, the main performance problem is that we only query graphql next when we exhaust the current stream of block heights. I would like to prefetch the next stream while yielding the current one. In many cases, the yielding of the block will happen extremely fast, so there's wait time between block dates. Otherwise, it's mainly optimizing the indexer for space or query speed. For a quick explanation of the algorithm: We encode the EG for compressing consecutive bits down. Essentially encoding X number of Y value bits instead of having Y bits X times. The Elias Gamma of a single bit is the bit itself. So no compression is done. Otherwise, it takes the following form: The first bit in the bitmap is special as it merely encodes what sign the first elias gamma encodes. The remaining EGs can be guessed as to what they encode by counting them. We do explicitly write what bit value the last EG is encoding as when we want to append a bit to the end of the bitmap, we can decompress only the last EG instead of all of them. |
Fully replace Index file backfill with Bitmap query backfill. This involves switching out the delta lake get_matching_block_heights function call with the BlockHeightStream function of the same name. The yield of the BlockHeightStream call is a single block height instead of a full list, so some behavior of backfill could change, potentially for the better. In addition, the first iteration of the new backfill may have some performance problems. Thus, the performance needs to be evaluated and any less complex improvements should be made (Such as preloading the next day's bitmaps while yielding the current day).
The text was updated successfully, but these errors were encountered: