Impossible to exclude "İnşaat Malları" as generic #8261

serhii-muchychka · 2023-06-11T16:51:11Z

I tried to do this:
f4e5e1d

Steps:

I added "^İnşaat Malları$" to generics in the appropriate file
Deleted "İnşaat Malları" brand entry
run npm run build 1st time - OK (The script replaces letters with lowercase ones in the new generic string)
run npm run build 2nd time - script re-add the brand entry

It seems that the problem relates to letter "İ"

The text was updated successfully, but these errors were encountered:

Related: f4e5e1d

bhousel · 2023-06-12T16:25:40Z

Looks like another edge case like what we had in #5017
It's 2 years later so I'll play around and see whether there are newer better ways to do this.

1ec5 · 2023-06-12T19:23:19Z

Long-term, we shouldn’t do any manual diacritic-folding to compare strings, even with the help of the libraries in #5017 (comment). Especially since NSI tends to compare whole strings, String.prototype.localeCompare and Intl.Compare are a lot more robust. However, the behavior depends on the language you pass in. I guess individual entries would need to be able to specify the language name is in, since OSM doesn’t do that?

bhousel · 2023-06-12T20:11:40Z

Ok I added some fixes that will keep the generic "İnşaat Malları" from sneaking back into the index.

This was tricky because: you'd think that case insensitive regex /i would catch
both upper and lower case variants of this, but it doesn't.

Then, I tried to match both variants with an exclude regex like '^(İ|i̇)nşaat malları$',
but toLowerCasing that regex in our file_tree writing code was changing the 'İ'.

So for now, our build scripts can just avoid toLowerCasing a string with a 'İ' in it.

bhousel · 2023-06-12T20:19:30Z

Long-term, we shouldn’t do any manual diacritic-folding to compare strings, even with the help of the libraries in #5017 (comment)

@1ec5 Can you say more what you mean by this? I kind of think we do need to continue to diacritic fold the strings?
We mostly do this to catch typos in the OSM tags that we're matching.

I guess our basic use case is: if someone creates something in OSM with name=Haagen Dazs , Rapid can suggest the tag name=Häagen-Dazs instead. I can't think of a situation where the two locally used names would differ only by a diacritic mark.

serhii-muchychka · 2023-06-13T09:37:01Z

It turns out that this is a known problem. There is an article about it in Wikipedia, maybe someone will be interested, so I leave the link here:
https://en.wikipedia.org/wiki/Dotted_and_dotless_I_in_computing

This issue has some more info. osmlab/name-suggestion-index#8261 Don't know whether this letter is used by osm-community-index communities, but we might as well all use the same simplify.js code

bhousel added the bug label Jun 11, 2023

bhousel referenced this issue Jun 12, 2023

npm run build continue add empty entry named with "İnşaat Malları"

a412255

Related: f4e5e1d

LaoshuBaby referenced this issue Jun 12, 2023

Exclude generic

f4e5e1d

bhousel closed this as completed in ac47ca4 Jun 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Impossible to exclude "İnşaat Malları" as generic #8261

Impossible to exclude "İnşaat Malları" as generic #8261

serhii-muchychka commented Jun 11, 2023 •

edited

Loading

bhousel commented Jun 12, 2023

1ec5 commented Jun 12, 2023

bhousel commented Jun 12, 2023

bhousel commented Jun 12, 2023 •

edited

Loading

serhii-muchychka commented Jun 13, 2023

Impossible to exclude "İnşaat Malları" as generic #8261

Impossible to exclude "İnşaat Malları" as generic #8261

Comments

serhii-muchychka commented Jun 11, 2023 • edited Loading

bhousel commented Jun 12, 2023

1ec5 commented Jun 12, 2023

bhousel commented Jun 12, 2023

bhousel commented Jun 12, 2023 • edited Loading

serhii-muchychka commented Jun 13, 2023

serhii-muchychka commented Jun 11, 2023 •

edited

Loading

bhousel commented Jun 12, 2023 •

edited

Loading