Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matchlib returns unnecessarily many result pairs compared to the input strings #3

Open
matiaslindgren opened this issue Jan 5, 2019 · 0 comments

Comments

@matiaslindgren
Copy link
Contributor

matcher.match_all_combinations accepts n strings and compares O(n^2/2) pairs, then returns results for all of those pairs (possibly filtered by a minimum_similarity parameter).
This makes the function generic, but also significantly increases the amount of results. When those pairs are processed by Radar, it searches for the pairs which maximize the similarity for each input string:
https://github.com/Aalto-LeTech/radar/blob/00a917ccb0f9a299aec265a2003776d9bee8ed8f/matcher/tasks.py#L120-L131

To significantly reduce the amount of returned results, matcher._match_all (or a new function) should also accept the current maximum similarity of each string, which allows the filtering to take place in matchlib, instead of Radar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant