-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode/UTF-8 support #11
Comments
None, I'm afraid. I'm unsure if the underlying Levenshtein implementation supports it. |
Hmmm.. You said that you use the same Levenshtein library as the original fuzzywuzzy does. How does fuzzywuzzy handles Unicode? |
I recall seatgeek/fuzzywuzzy supporting two implementations; it will use python-Levenshtein if installed. But from their past issues it seems that the library supports unicode. I interpret that the Levenshtein implementation could work with any integral type, from So I figure the Python interop maps unicode characters to integral values? I cannot say for sure. I would personally start there: see how Unicode is treated when Python calls C code and figure out if the Levenshtein implementation have to be changed. |
I guess there's no easy way to get Unicode support (Without switching std::string to a Unicode aware one for the entire library). Thanks . |
On 20-10-2019 00:09, Martin Chang wrote:
I guess there's no easy way to get Unicode support (Without switching std::string to a Unicode aware one for the entire library). Thanks .
`std::string` can still be used (at least on Linux-based distros), but
there will be a lot of intermediate `std::wstring` instances along with
`std::mbstowcs` calls. Likely some hacks to interface with
`levenshtein.c` unless reimplemented, too.
|
I think the proper solution would be to use something like Qt's |
On 20-10-2019 21:38, Martin Chang wrote:
I think the proper solution would be to use something like Qt's `QString` and re-implement levenshtein.c to support it.
I don't want to pull in Qt just for string comparison; Levenshtein could instead be reimplemented
via templates where we let the user decide what string type they
want to use.
|
I'll look into that. |
Hi, I want to contribute Unicode support for the library. Do you know how much/what have to be done for the feature?
The text was updated successfully, but these errors were encountered: