-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of a one-level PGM based on the Eytzinger array #22
base: master
Are you sure you want to change the base?
Conversation
Hi @RomaA2000, using the Eytzinger layout is a cool idea, thanks for opening this PR! I have a suggestion that should improve the cache efficiency and the memory usage of your implementation of It seems that you are using an Why not replacing For what concerns the PGM-index/include/pgm/pgm_index_variants.hpp Line 520 in 456619f
BucketingPGMIndex . This is because the argument PGMIndexEytzinger<K, Epsilon, Floating> passed to the fifth template parameter of BucketingPGMIndex (i.e. PGMType ) is used only in this line of BucketingPGMIndex : PGM-index/include/pgm/pgm_index_variants.hpp Line 382 in 456619f
But
So BucketingPGMIndex::Segment == typename PGMType::Segment == PGMIndex<...>::Segment .
Nonetheless, combining Thanks again @RomaA2000. I leave this PR open if you'd like to continue to work on this. |
Would completing this PR do anything (indirectly) for DynamicPGMIndex::lower_bound_bl because from what I can see DynamicPGMIndex::find spends 50-75% of its time in this function either in line 459 or 459 waiting on the prefetches. In consequence it pushes search performance into the next order of 10 per operation compared to PGMIndex e.g. 3-4x worse. I rarely need a few mutations, so I'll need to consider if should figure out how to improve DynamicPGMIndex or somehow extend PGMIndex to meet needs. |
Since this structure implies the presence of binary search between levels, we thought that we could replace a regular array with one that is optimized for cache queries, accordingly, this speeds up binary search. We also added types for a single-level PGM and a BPGM with this optimization. In this version, we store both arrays, since after transferring the data over them, it is impossible to iterate in the desired order.