Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Use a Nodegraph for searching in internal nodes #1138

Closed
wants to merge 12 commits into from

Conversation

luizirber
Copy link
Member

Updated version of #395, fixes #203

This builds up on #1137 and uses a Nodegraph for querying internal nodes. It actually passes the test currently failing in #1137, so I'm starting to think there is something fishy going on with node.data.matches(mh)...

TODO:

  • It is slower than the previous way, which is annoying. Might need to change how containment and similarity are calculated in Rust (can do popcnt over the bitsets, instead of building intersection and union)

Checklist

  • Is it mergeable?
  • make test Did it pass the tests?
  • make coverage Is the new code covered?
  • Did it change the command-line interface? Only additions are allowed
    without a major version increment. Changing file formats also requires a
    major version number increment.
  • Was a spellchecker run on the source code and documentation after
    changes were made?

@ctb
Copy link
Contributor

ctb commented Aug 2, 2020

I'm starting to think there is something fishy going on with node.data.matches(mh)...
oh no!

@codecov
Copy link

codecov bot commented Aug 2, 2020

Codecov Report

Merging #1138 (4ab065d) into remove_min_n_below (fc2ee33) will decrease coverage by 0.13%.
The diff coverage is 37.03%.

Impacted file tree graph

@@                  Coverage Diff                   @@
##           remove_min_n_below    #1138      +/-   ##
======================================================
- Coverage               89.39%   89.26%   -0.14%     
======================================================
  Files                     122      122              
  Lines                   19001    19044      +43     
  Branches                 1448     1451       +3     
======================================================
+ Hits                    16986    16999      +13     
- Misses                   1796     1827      +31     
+ Partials                  219      218       -1     
Flag Coverage Δ
python 94.64% <68.96%> (-0.05%) ⬇️
rust 66.60% <0.00%> (-0.41%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/core/src/ffi/nodegraph.rs 0.00% <0.00%> (ø)
src/core/src/sketch/nodegraph.rs 78.07% <0.00%> (-1.44%) ⬇️
src/sourmash/nodegraph.py 65.87% <25.00%> (-5.18%) ⬇️
src/sourmash/sbtmh.py 95.83% <100.00%> (+2.13%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fc2ee33...4ab065d. Read the comment docs.

@luizirber luizirber force-pushed the nodegraph_query branch 3 times, most recently from 299bb38 to 78ab059 Compare October 27, 2020 22:02
@luizirber luizirber mentioned this pull request Oct 30, 2020
10 tasks
@luizirber luizirber force-pushed the remove_min_n_below branch from b5db252 to 85f707f Compare March 26, 2021 02:57
@ctb
Copy link
Contributor

ctb commented Mar 26, 2021

curious if the changes in #1392 help or hurt this effort.

(I don't like all of the function/class names in that PR, but it does some nice code cleanup and consolidation so I plan to propose it for merge in a few weeks.)

@luizirber luizirber force-pushed the remove_min_n_below branch from 85f707f to a689d17 Compare April 17, 2021 02:47
use get_bf

use containment directly

bench and update containment

create new function for intersection and union size

keep doing it inplace...

wip simdeez

remove simd
@luizirber
Copy link
Member Author

I'm starting to think there is something fishy going on with node.data.matches(mh)...
oh no!

turns out I was doing ng.containment(ng2) wrong, oops!
(I was using sum(len(hash_tables)) as denominator, which underestimated containment; the fix was to use sum(set_bits), only considering the bits that are set in the nodegraph)

@luizirber
Copy link
Member Author

Not focusing on SBTs anymore

@luizirber luizirber closed this Feb 13, 2023
@luizirber luizirber deleted the nodegraph_query branch February 13, 2023 04:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants