Function to compute Dice coefficients of bitarray pairs #567
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Anonlink's similarity functions currently compare every possible candidate pair via a cartesian product, and works really well with a fairly large number of encodings. With very fine grained blocking such as p-sig you may have many small blocks. Anonlink doesn't do to well with this as we optimized the comparison function for high throughput with large batches - not for low latency with tiny batches.
To get higher throughput it is tempting to merge together a bunch of these small blocks before calling anonlink - however this approach adds candidate pairs that were not explicitly in the blocking rules - skewing results and performing unnecessary work.
This PR adds a function in
anonlink.similarities
to compute the Dice coefficient on pairs of bitarrays. If useful we could also implement an accelerated version.