Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of a one-level PGM based on the Eytzinger array #22

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

RomaA2000
Copy link
Contributor

@RomaA2000 RomaA2000 commented Apr 13, 2021

Since this structure implies the presence of binary search between levels, we thought that we could replace a regular array with one that is optimized for cache queries, accordingly, this speeds up binary search. We also added types for a single-level PGM and a BPGM with this optimization. In this version, we store both arrays, since after transferring the data over them, it is impossible to iterate in the desired order.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
@gvinciguerra
Copy link
Owner

gvinciguerra commented Apr 15, 2021

Hi @RomaA2000, using the Eytzinger layout is a cool idea, thanks for opening this PR! I have a suggestion that should improve the cache efficiency and the memory usage of your implementation of PGMIndexEytzinger.

It seems that you are using an IdxHolder struct that stores a pair <value, position of the segment with key = value inside this->segments> so that when you search in EytzingerArray<IdxHolder> you can find the position of the segment in this->segments, correct?

Why not replacing this->segments and eytzinger_first_layer with an EytzingerArray<Segment>? This way you don't need to store additional sizeof(T) bytes for IdxHolder.value(because the segment's value is stored only once in the corresponding Segment struct), plus additional sizeof(size_t) bytes for IdxHolder.idx (because the segments are stored in the Eytzinger layout) 😉


For what concerns the BucketingPGMIndexEytzinger alias defined here:

using BucketingPGMIndexEytzinger = BucketingPGMIndex<K, Epsilon, TopLevelBitSize, Floating, PGMIndexEytzinger<K, Epsilon, Floating>>;
it is equivalent to a standard BucketingPGMIndex. This is because the argument PGMIndexEytzinger<K, Epsilon, Floating> passed to the fifth template parameter of BucketingPGMIndex (i.e. PGMType) is used only in this line of BucketingPGMIndex:
using Segment = typename PGMType::Segment;

But PGMIndexEytzinger<K, Epsilon, Floating>::Segment is defined as

using Segment = typename PGMIndex<K, Epsilon, 0,Floating>::Segment;

So BucketingPGMIndex::Segment == typename PGMType::Segment == PGMIndex<...>::Segment.

Nonetheless, combining BucketingPGMIndex with the Eytzinger layout is a cool idea too that is worth developing.


Thanks again @RomaA2000. I leave this PR open if you'd like to continue to work on this.

@gvinciguerra gvinciguerra added the help wanted Extra attention is needed label Apr 15, 2021
@gshanemiller
Copy link

gshanemiller commented Oct 1, 2022

Would completing this PR do anything (indirectly) for DynamicPGMIndex::lower_bound_bl because from what I can see DynamicPGMIndex::find spends 50-75% of its time in this function either in line 459 or 459 waiting on the prefetches. In consequence it pushes search performance into the next order of 10 per operation compared to PGMIndex e.g. 3-4x worse. I rarely need a few mutations, so I'll need to consider if should figure out how to improve DynamicPGMIndex or somehow extend PGMIndex to meet needs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants