-
Notifications
You must be signed in to change notification settings - Fork 893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Impossible to exclude "İnşaat Malları" as generic #8261
Comments
Looks like another edge case like what we had in #5017 |
Long-term, we shouldn’t do any manual diacritic-folding to compare strings, even with the help of the libraries in #5017 (comment). Especially since NSI tends to compare whole strings, |
Ok I added some fixes that will keep the generic "İnşaat Malları" from sneaking back into the index. This was tricky because: you'd think that case insensitive regex /i would catch Then, I tried to match both variants with an exclude regex like '^(İ|i̇)nşaat malları$', So for now, our build scripts can just avoid toLowerCasing a string with a 'İ' in it. |
@1ec5 Can you say more what you mean by this? I kind of think we do need to continue to diacritic fold the strings? I guess our basic use case is: if someone creates something in OSM with |
It turns out that this is a known problem. There is an article about it in Wikipedia, maybe someone will be interested, so I leave the link here: |
This issue has some more info. osmlab/name-suggestion-index#8261 Don't know whether this letter is used by osm-community-index communities, but we might as well all use the same simplify.js code
I tried to do this:
f4e5e1d
Steps:
npm run build
1st time - OK (The script replaces letters with lowercase ones in the new generic string)npm run build
2nd time - script re-add the brand entryIt seems that the problem relates to letter "İ"
The text was updated successfully, but these errors were encountered: