Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SparseList Follow Protocol Speedup #613

Merged
merged 3 commits into from
Oct 20, 2024
Merged

Conversation

kylebd99
Copy link
Collaborator

The existing implementation always binary searches over the entire index vector. However, we should always be doing lookups in increasing order, so we can actually "consume" the index vector instead by moving up the lower search bound each time. This also lets us use the expsearch function instead because we now have reason to believe that the lower bound is close to the true value.

Before Change:

┌ Info: experiment
└   sp = 0.01
[ Info: merge
  78.459 μs (2 allocations: 96 bytes)
[ Info: hash (search)
  290.835 μs (2 allocations: 96 bytes)
[ Info: gallop
  89.306 μs (2 allocations: 96 bytes)
[ Info: hash construction
  787.695 μs (63 allocations: 1.62 MiB)
[ Info: hash (hashtable)
  170.373 μs (2 allocations: 128 bytes)
┌ Info: experiment
└   sp = 0.005
[ Info: merge
  51.509 μs (2 allocations: 96 bytes)
[ Info: hash (search)
  161.717 μs (2 allocations: 96 bytes)
[ Info: gallop
  62.533 μs (2 allocations: 96 bytes)
[ Info: hash construction
  782.748 μs (63 allocations: 1.62 MiB)
[ Info: hash (hashtable)
  82.390 μs (2 allocations: 128 bytes)
┌ Info: experiment
└   sp = 0.001
[ Info: merge
  15.879 μs (2 allocations: 96 bytes)
[ Info: hash (search)
  37.581 μs (2 allocations: 96 bytes)
[ Info: gallop
  13.650 μs (2 allocations: 96 bytes)
[ Info: hash construction
  785.824 μs (63 allocations: 1.61 MiB)
[ Info: hash (hashtable)
  9.868 μs (2 allocations: 128 bytes)
┌ Info: experiment
└   sp = 0.0005
[ Info: merge
  13.307 μs (2 allocations: 96 bytes)
[ Info: hash (search)
  19.855 μs (2 allocations: 96 bytes)
[ Info: gallop
  6.050 μs (2 allocations: 96 bytes)
[ Info: hash construction
  785.667 μs (63 allocations: 1.62 MiB)
[ Info: hash (hashtable)
  4.053 μs (2 allocations: 128 bytes)
┌ Info: experiment
└   sp = 0.0001
[ Info: merge
  9.198 μs (2 allocations: 96 bytes)
[ Info: hash (search)
  3.912 μs (2 allocations: 96 bytes)
[ Info: gallop
  1.063 μs (2 allocations: 96 bytes)
[ Info: hash construction
  783.611 μs (63 allocations: 1.61 MiB)
[ Info: hash (hashtable)
  816.976 ns (2 allocations: 128 bytes)

After change:

┌ Info: experiment
└   sp = 0.01
[ Info: merge
  93.398 μs (2 allocations: 96 bytes)
[ Info: hash (search)
  88.104 μs (2 allocations: 96 bytes)
[ Info: gallop
  92.542 μs (2 allocations: 96 bytes)
[ Info: hash construction
  777.537 μs (63 allocations: 1.61 MiB)
[ Info: hash (hashtable)
  174.599 μs (2 allocations: 128 bytes)
┌ Info: experiment
└   sp = 0.005
[ Info: merge
  58.945 μs (2 allocations: 96 bytes)
[ Info: hash (search)
  54.867 μs (2 allocations: 96 bytes)
[ Info: gallop
  60.835 μs (2 allocations: 96 bytes)
[ Info: hash construction
  787.040 μs (63 allocations: 1.62 MiB)
[ Info: hash (hashtable)
  86.802 μs (2 allocations: 128 bytes)
┌ Info: experiment
└   sp = 0.001
[ Info: merge
  21.538 μs (2 allocations: 96 bytes)
[ Info: hash (search)
  12.863 μs (2 allocations: 96 bytes)
[ Info: gallop
  14.054 μs (2 allocations: 96 bytes)
[ Info: hash construction
  757.483 μs (63 allocations: 1.61 MiB)
[ Info: hash (hashtable)
  8.999 μs (2 allocations: 128 bytes)
┌ Info: experiment
└   sp = 0.0005
[ Info: merge
  18.049 μs (2 allocations: 96 bytes)
[ Info: hash (search)
  6.152 μs (2 allocations: 96 bytes)
[ Info: gallop
  6.423 μs (2 allocations: 96 bytes)
[ Info: hash construction
  790.806 μs (63 allocations: 1.61 MiB)
[ Info: hash (hashtable)
  3.876 μs (2 allocations: 128 bytes)
┌ Info: experiment
└   sp = 0.0001
[ Info: merge
  14.527 μs (2 allocations: 96 bytes)
[ Info: hash (search)
  929.694 ns (2 allocations: 96 bytes)
[ Info: gallop
  1.065 μs (2 allocations: 96 bytes)
[ Info: hash construction
  792.627 μs (63 allocations: 1.62 MiB)
[ Info: hash (hashtable)
  810.836 ns (2 allocations: 128 bytes)

Could use a close look though, I'm not 100% sure how the lowering works with 'thunk'

@kylebd99
Copy link
Collaborator Author

kylebd99 commented Oct 1, 2024

@willow-ahrens Just wanted to bump this because I'd forgotten about it. It doesn't seem like the python test failure is related to the PR, but I'm not sure.

Copy link
Collaborator

@willow-ahrens willow-ahrens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per our discussion last week, this looks good to me! It is overlooked that lookups are accessed in monotonically increasing order, which makes this possible. We may want to roll out a protocol for purely random access as well, but that's another issue.

@willow-ahrens
Copy link
Collaborator

This took a while to merge as I needed to fix python ci

@willow-ahrens willow-ahrens merged commit 57dbc1f into main Oct 20, 2024
6 checks passed
@willow-ahrens willow-ahrens deleted the kbd-sparse-follow-speedup branch October 20, 2024 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants