Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Improving intersection and union calculations #1475

Merged
merged 2 commits into from
May 7, 2021
Merged

Conversation

luizirber
Copy link
Member

Playing a bit with the intersection() and intersection_size() methods in Rust. I started looking into this because of #1474 #1392 and the nagging suspicion that the calls to .merge were making the performance worse. I isolated the .merge() call for num MH sketches, and started working on scaled MH sketches first (because they are easier to reason about).

In order to measure it, I created two new benchmarks (run with cargo bench -- minhash).

This is the baseline (latest):

minhash/intersection    
                        time:   [4.4254 us 4.4562 us 4.4819 us]
minhash/intersection_size
                        time:   [4.4531 us 4.4733 us 4.5121 us]

For the first try, I created an Union iterator, similar to the current Intersection iterator:

minhash/intersection
                        time:   [4.0357 us 4.0462 us 4.0572 us]
                        change: [-10.946% -9.8118% -8.6927%] (p = 0.00 < 0.05)
minhash/intersection_size
                        time:   [3.5351 us 3.5544 us 3.5758 us]
                        change: [-21.742% -21.152% -20.534%] (p = 0.00 < 0.05)

This is already pretty good, but it ends up iterating twice over the data.

The second try is an unrolled version (combining the Intersection and Union iterators, and keeping only data we need):

minhash/intersection
                        time:   [1.7962 us 1.8025 us 1.8056 us]
                        change: [-60.819% -60.157% -59.532%] (p = 0.00 < 0.05)
minhash/intersection_size
                        time:   [2.3091 us 2.3147 us 2.3190 us]
                        change: [-48.940% -48.615% -48.267%] (p = 0.00 < 0.05)

Twice as fast, nice.

So, I tried to run gather from https://github.com/luizirber/sourmash_resources, and...
image
It's pretty similar to the latest results 😂

So, the problem is in other places, but might as well merge this eventually because it is faster in microbenchmarks? =P

@codecov
Copy link

codecov bot commented Apr 21, 2021

Codecov Report

Merging #1475 (d485860) into latest (c923abe) will decrease coverage by 0.13%.
The diff coverage is 73.07%.

Impacted file tree graph

@@            Coverage Diff             @@
##           latest    #1475      +/-   ##
==========================================
- Coverage   89.81%   89.68%   -0.14%     
==========================================
  Files         124      124              
  Lines       19888    19966      +78     
  Branches     1515     1515              
==========================================
+ Hits        17863    17906      +43     
- Misses       1796     1831      +35     
  Partials      229      229              
Flag Coverage Δ
python 94.93% <ø> (ø)
rust 66.59% <73.07%> (-0.25%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/core/src/sketch/minhash.rs 87.21% <73.07%> (-3.30%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c923abe...d485860. Read the comment docs.

mergeless intersection/union, avoid dual iterating
@luizirber
Copy link
Member Author

luizirber commented May 7, 2021

Going for a review and merge on this one @ctb @bluegenes. Turns out this makes a HUGE difference when using really large queries (think GB-sized) during search/gather/prefetch. Intersection/union can still be improved for num MH sketches, but I'm punting it to another issue.

(anyone else wants to review some Rust code? @erikyoung85 @olgabot @keyabarve)

@luizirber luizirber changed the title [WIP] Improving intersection and union calculations [MRG] Improving intersection and union calculations May 7, 2021
@ctb
Copy link
Contributor

ctb commented May 7, 2021

Based on the (lack of) changes to the Python code and the passing of the Python tests, I'm happy to approve it :). But I can't offer learned (or, well, any) comments on the Rust code, I'm afraid!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants