how to give more weight to first two words #367
Unanswered
abubelinha
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
The kind of data I need to match are biological names where I am interested in putting special care in matching the first two latin words.
Is there any way I can do it, without actually removing the right side of the string?
Better an example of what I want (based on some problematic matches I am obtaining).
My list of fuzzy strings are those in the left column of this table.
My list of optional strings to match is much longer, but it includes the strings in the middle and right columns:
In most of these cases, words in the first column match the beginning words of the corresponding string in the right column ... specially the first two latin words.
(but not always, since there may be mistakes in the fuzzy string like it happens in the last example row).
But RapidFuzz finds more similitude with the strings in the middle column.
I guess this is happening because the string length is an important factor.
Any suggestions on how I can modify
extractOne
/extract
behaviour to accomplish what I want?Beware I cannot just split the fuzzy strings and the options in order to use only their first two words, since in most cases there would be many string options that would begin exactly like that, and their remaining words are important to resolve which option is the most similar.
i.e., the penultimate row shows a correct match, but if I used only the first two words then it would happen the wrong match currently shown in the previous row (also, there could exist another string option "Carex flacca Lam." which could be wrongly matched if I only use "Carex flacca" as my fuzzy and optional string).
So the point is giving more weight to left two words fuzzy matching ... but don't throw the remaining info.
I guess I am answering myself but I wonder if there are more factors to consider here, looking to the examples.
Thanks a lot
@abubelinha
Beta Was this translation helpful? Give feedback.
All reactions