Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function to compute Dice coefficients of bitarray pairs #567

Merged
merged 4 commits into from
May 1, 2023

Conversation

hardbyte
Copy link
Collaborator

Anonlink's similarity functions currently compare every possible candidate pair via a cartesian product, and works really well with a fairly large number of encodings. With very fine grained blocking such as p-sig you may have many small blocks. Anonlink doesn't do to well with this as we optimized the comparison function for high throughput with large batches - not for low latency with tiny batches.

To get higher throughput it is tempting to merge together a bunch of these small blocks before calling anonlink - however this approach adds candidate pairs that were not explicitly in the blocking rules - skewing results and performing unnecessary work.

This PR adds a function in anonlink.similarities to compute the Dice coefficient on pairs of bitarrays. If useful we could also implement an accelerated version.

@codecov
Copy link

codecov bot commented Apr 30, 2023

Codecov Report

Merging #567 (219633a) into main (cdca890) will increase coverage by 0.09%.
The diff coverage is 100.00%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #567      +/-   ##
==========================================
+ Coverage   94.19%   94.28%   +0.09%     
==========================================
  Files          16       16              
  Lines         792      805      +13     
==========================================
+ Hits          746      759      +13     
  Misses         46       46              

@hardbyte hardbyte requested a review from wilko77 April 30, 2023 22:45
@hardbyte hardbyte merged commit 5786439 into main May 1, 2023
@hardbyte hardbyte deleted the feature/similarities-of-pairs branch May 1, 2023 00:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants