Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Adjust Index.find search protocol to support selective collection of matches #1477

Merged
merged 121 commits into from
Apr 23, 2021

Conversation

ctb
Copy link
Contributor

@ctb ctb commented Apr 21, 2021

This PR builds off of the new Index.find method and JaccardSearch protocol in #1392 to provide selective collection of matches.

In #1392, we introduced the following logic in Index.find(...):

            score = search_fn.score_fn(...)

            if search_fn.passes(score):                                     
                search_fn.collect(score)
                yield subj, score

and this PR changes the collect(...) method so that it takes the match and can decide whether or not to continue:

            score = search_fn.score_fn(...)

            if search_fn.passes(score):                                     
                if search_fn.collect(score, subj):
                    yield subj, score

This permits the collect(...) method on JaccardSearch objects to inspect the potential match and potentially decide to reject it. In turn, this enables signature "masking" functionality potentially useful for #849 and #985.

Ref #1392 (comment)

cc @bluegenes

@ctb ctb changed the base branch from latest to refactor/index_find April 21, 2021 23:20
@codecov
Copy link

codecov bot commented Apr 21, 2021

Codecov Report

Merging #1477 (5309bcb) into latest (f02e250) will increase coverage by 5.17%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           latest    #1477      +/-   ##
==========================================
+ Coverage   89.71%   94.89%   +5.17%     
==========================================
  Files         123       96      -27     
  Lines       19464    15926    -3538     
  Branches     1483     1497      +14     
==========================================
- Hits        17463    15113    -2350     
+ Misses       1775      587    -1188     
  Partials      226      226              
Flag Coverage Δ
python 94.89% <100.00%> (+0.02%) ⬆️
rust ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/sourmash/index.py 94.62% <100.00%> (ø)
src/sourmash/lca/lca_db.py 92.30% <100.00%> (ø)
src/sourmash/sbt.py 80.85% <100.00%> (+0.02%) ⬆️
src/sourmash/search.py 93.26% <100.00%> (+0.03%) ⬆️
tests/test_index.py 100.00% <100.00%> (ø)
tests/test_search.py 98.48% <100.00%> (ø)
src/core/src/ffi/utils.rs
src/core/src/index/sbt/mhbt.rs
src/core/src/index/linear.rs
src/core/src/sketch/nodegraph.rs
... and 23 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f02e250...5309bcb. Read the comment docs.

Base automatically changed from refactor/index_find to latest April 22, 2021 02:17
@ctb ctb changed the title [WIP] Adjust Index.find search protocol to support selective collection of matches [MRG] Adjust Index.find search protocol to support selective collection of matches Apr 22, 2021
@ctb
Copy link
Contributor Author

ctb commented Apr 22, 2021

@bluegenes I think this is ready for review.

Copy link
Contributor

@bluegenes bluegenes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As preliminary functionality, this looks good to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants