You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The changes introduced in #498 don't properly downsample query sketches in the regular fastmultigather, or in the RocksDB-based fastmultigather; it seems like there are several reasons.
First, Collection::sig_from_record(...) doesn't downsample the loaded Signature, even thought the Record contains the right scaled value. This, in turn, means that when turning the Signature into a KmerMinHash, the original scaled value is used, not the desired one chosen by the selection.
This affects both fastmultigather implementations.
Second, in the RocksDB-based fastmultigather, scaled was not being reset properly on the query collection. I need to look into exactly why. But, running select on the loaded query collection seems to fix it.
It looks like fastgather escaped this bug because it does not use Collection::sig_from_record, but instead uses Collection::sig_for_dataset(), which seems to do the appropriate downsampling.
We should also verify that proper downsampling is going on in the other places that use sig_from_record, too. That is both in MultiCollection generically and in manysearch.
The text was updated successfully, but these errors were encountered:
The changes introduced in #498 don't properly downsample query sketches in the regular
fastmultigather
, or in the RocksDB-basedfastmultigather
; it seems like there are several reasons.First,
Collection::sig_from_record(...)
doesn't downsample the loadedSignature
, even thought theRecord
contains the rightscaled
value. This, in turn, means that when turning theSignature
into aKmerMinHash
, the original scaled value is used, not the desired one chosen by the selection.This affects both
fastmultigather
implementations.Second, in the RocksDB-based
fastmultigather
,scaled
was not being reset properly on the query collection. I need to look into exactly why. But, runningselect
on the loaded query collection seems to fix it.It looks like
fastgather
escaped this bug because it does not useCollection::sig_from_record
, but instead usesCollection::sig_for_dataset()
, which seems to do the appropriate downsampling.We should also verify that proper downsampling is going on in the other places that use
sig_from_record
, too. That is both inMultiCollection
generically and inmanysearch
.The text was updated successfully, but these errors were encountered: