You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
rapidfuzz 3.9.6 and python 3.10.
I have carefully read the rapidfuzz docs and made tests. I find: partial_ratio use Indel.normalized_similarity.
Could partial_ratio also support Levenshtein.normalized_similarity? Maybe default to use Indel.normalized_similarity, but we can choose to use Levenshtein.normalized_similarity.
In my use case, I indeed need partial_ratio to use Levenshtein.normalized_similarity.
thanks!!
fromicecreamimporticfromrapidfuzzimportfuzzfromrapidfuzz.distanceimportLevenshtein, Indela='34cdef16z'c='09cdef78'ic(fuzz.partial_ratio(a, c))
ic(fuzz.partial_ratio_alignment(a, c))
ic(fuzz.ratio(a[0:6], c))
ic(Indel.normalized_similarity(a[0:6], c) *100)
ic(Levenshtein.normalized_similarity(a[0:6], c) *100)
ic|test.py:25in<module>-fuzz.partial_ratio(a, c): 57.14285714285714ic|test.py:26in<module>-fuzz.partial_ratio_alignment(a, c): ScoreAlignment(score=57.14285714285714, src_start=0, src_end=6, dest_start=0, dest_end=8)
ic|test.py:27in<module>-fuzz.ratio(a[0:6], c): 57.14285714285714ic|test.py:28in<module>-Indel.normalized_similarity(a[0:6], c) *100: 57.14285714285714ic|test.py:29in<module>-Levenshtein.normalized_similarity(a[0:6], c) *100: 50.0
The text was updated successfully, but these errors were encountered:
In theory we could support this, but this comes with a couple of issues:
partial_ratio is used in a lot of the "combined" algorithms like WRatio. So it's not completely clear what to do about them
would this only allow Indel and Levenshtein which seems a bit arbitrary or more metrics? For more metrics it gets a lot more complex to handle things like the different kinds of normalizations
This has a couple of optimizations to keep the sliding window algorithm remotely fast by skipping some unneeded calculations. This doesn't work for the generic case though. Probably this could have a generic fallback, but it would likely need optimized versions for algorithms where this is possible.
One more possible setting could be whether the sliding window allows for shorter substrings at the start / end of the longer string.
Not completely sure whether this should be added under a different name either.
I assume a generic implementation shouldn't be too much work and optimized versions could be added over time. Still not sure when I will get around to it though.
thanks a lot for maxbachmann.
You have a global view on this issue!
As Indel and Levenshtein are very similar, I think it would be easier only support Indel and Levenshtein in partial_ratio, to save your time. For me, I hope to only use Levenshtein in most NLP process.
rapidfuzz 3.9.6 and python 3.10.
I have carefully read the rapidfuzz docs and made tests. I find: partial_ratio use Indel.normalized_similarity.
Could partial_ratio also support Levenshtein.normalized_similarity? Maybe default to use Indel.normalized_similarity, but we can choose to use Levenshtein.normalized_similarity.
In my use case, I indeed need partial_ratio to use Levenshtein.normalized_similarity.
thanks!!
The text was updated successfully, but these errors were encountered: