Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: neither fastmultigather implementation properly downsamples with explicit --scaled #505

Closed
ctb opened this issue Nov 10, 2024 · 2 comments · Fixed by #514
Closed

bug: neither fastmultigather implementation properly downsamples with explicit --scaled #505

ctb opened this issue Nov 10, 2024 · 2 comments · Fixed by #514

Comments

@ctb
Copy link
Collaborator

ctb commented Nov 10, 2024

The changes introduced in #498 don't properly downsample query sketches in the regular fastmultigather, or in the RocksDB-based fastmultigather; it seems like there are several reasons.

First, Collection::sig_from_record(...) doesn't downsample the loaded Signature, even thought the Record contains the right scaled value. This, in turn, means that when turning the Signature into a KmerMinHash, the original scaled value is used, not the desired one chosen by the selection.
This affects both fastmultigather implementations.

Second, in the RocksDB-based fastmultigather, scaled was not being reset properly on the query collection. I need to look into exactly why. But, running select on the loaded query collection seems to fix it.

It looks like fastgather escaped this bug because it does not use Collection::sig_from_record, but instead uses Collection::sig_for_dataset(), which seems to do the appropriate downsampling.

We should also verify that proper downsampling is going on in the other places that use sig_from_record, too. That is both in MultiCollection generically and in manysearch.

@ctb
Copy link
Collaborator Author

ctb commented Nov 10, 2024

This bug has existed for several releases, I think.

@ctb
Copy link
Collaborator Author

ctb commented Nov 10, 2024

tests and fixes in #504

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant