Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🌍 Low zoom locality names are missing translations + country, region zooms #977

Closed
meetar opened this issue Aug 18, 2016 · 19 comments
Closed

Comments

@meetar
Copy link
Contributor

meetar commented Aug 18, 2016

tl;dr

The new Natural Earth v4.1 populated places file now includes localized names in a name_*, specifically:

  1. Expands the name localization added in v4.0 to 21 languages (up from 7) and several
    dozen themes expanding from populated places to include all admin-0, admin-1,
    rivers, lakes, playas, geographic lines, physical labels, parks, airports, ports, and
    more. As part of this work a new unique and stable "ne_id" has been added for any
    feature with a name translation &/or a Wikidata ID concordance. The full list of
    languages is:

name_ar,
name_bn
,
name_de,
name_en,
name_es,
name_fr,
name_el,
name_hi
,
name_hu,
name_id
,
name_it,
name_ja
,
name_ko,
name_nl
,
name_pl,
name_pt,
name_ru,
name_sv
,
name_tr,
name_vi
, and
name_zh

(Names with * indicate
new language in v4.1 series.) A 2-character language code decoder ring is here:
https://en.wikipedia.org/wiki/List_of_ISO_639-2_codes. Props to Wikidata for their
CC0 license: https://www.wikidata.org/wiki/Wikidata:Introduction.

This is for many themes, not just populated places. So work here should be generic for many theme imports.

===

Original title: Include OpenStreetMap language data for Natural Earth features

For example, ocean names are available in many languages at zooms 9 and higher, but only in English at zoom 8 and lower. Example map: https://mapzen.com/tangram/play/?scene=https%3A%2F%2Fapi.github.com%2Fgists%2F9f177672cdaac4640ea1b519cde3420d#9.4233/13.3585/-38.4174

If you zoom out below zoom 9, the label disappears.

@nvkelso nvkelso added this to the v1.1.0 milestone Aug 18, 2016
@nvkelso
Copy link
Member

nvkelso commented Aug 18, 2016

We'd need to match the selection of features shown in earlier zooms when changing the source, but the extra names would sure be nice.

@nvkelso nvkelso modified the milestones: v1.2.0, v1.1.0 Oct 5, 2016
@nvkelso
Copy link
Member

nvkelso commented Feb 7, 2017

Related: #1140

@nvkelso nvkelso modified the milestones: v1.1.0, v1.2.0 Feb 7, 2017
@nvkelso
Copy link
Member

nvkelso commented Feb 7, 2017

OpenMapTiles uses some nifty logic to merge city ranks between OpenStreetMap and Natural Earth, we could probably do similar for the names? (Keeping the Natural Earth features, but decorating them with additional OpenStreetMap names at runtime.)

@zerebubuth
Copy link
Member

Might it make sense to do the inverse; keep the OpenStreetMap places, but decorate them with Natural Earth scalerank? We could do that regardless of zoom level, and have a consistent dataset for the whole zoom range.

@nvkelso nvkelso modified the milestones: v1.1.0, v1.2.0 Apr 4, 2017
@nvkelso nvkelso changed the title Include OpenStreetMap language data for Natural Earth features Low zoom locality names are missing translations Aug 17, 2017
@nvkelso
Copy link
Member

nvkelso commented Aug 17, 2017

Copying @matkoniecz issue comment from #1371 (comment) here:

  • What did you see?

https://en.wikipedia.org/wiki/%C5%81%C3%B3d%C5%BA has label "Lódz"

I would expect either Łódź (local name, name on English wikipedia) or Lodz in ASCII form. Not unholy contraption that is neither local nor English name, probably because NE has problem with UTF8.

id | 5776
-- | --
kind | locality
min_zoom | 6
name | Lódz
population | 758000
region_capital | true
source | naturalearthdata.com

English names without local names, as result map that is trying to show labels in specific language is broken on low zoom

country_capital | true
-- | --
id | 7143
kind | locality
min_zoom | 4
name | Warsaw
population | 1707000
source | naturalearthdata.com

OSM has several languages, and specifies proper local name.

  • What did you expect to see?
    Full set of name tags from OSM data.

In other words - I think that for cities it is desirable to discard NE and use pure OSM (or merge OSM with NE).

https://github.com/gravitystorm/openstreetmap-carto has quite good (see https://www.openstreetmap.org/#map=5/51.272/21.583 or other areas on low zoom) display of labels that IIRC is based solely on OSM data (if useful I may track down and summarize how this is done)

  • What map location are you having problems with? City and country are helpful, as well as tile coordinates or latitude / longitude.

Problem is global, systematic and affects all low zoom where OSM data is not used for city labels.

  • Screenshot? Props for animated gifs.

Encountered for example on https://mapzen.com/tangram/play/?#8.7500/48.1808/16.2415 (Vienna) during development of https://gist.github.com/matkoniecz/456041cea84e0cc58a82a070acd5b4b1 style (if problem is not clear I may produce minimal working example).

selection_005
selection_006

@nvkelso
Copy link
Member

nvkelso commented Aug 17, 2017

@matkoniecz I agree low-zoom name localization is an important area that needs further work, and we're hoping to tackle it in the next several months.

Generally speaking we used to only use Natural Earth at low-zooms and OpenStreetMap at mid- and high-zooms, but in v1.0 we started cross-fading between the two at around zoom 8, and some features like landuse and POIs are only from OpenStreetMap even at low-zooms. Some features like country labels and continent labels are just from OpenStreetMap at all zooms (and include the expected translations).

Some of OSM and NaturalEarth mixing is for historical reasons, and some of it is for my own vanity as the primary author of Natural Earth – when there are problems with Natural Earth my tendency is to want to fix them instead of just switching over to only OpenStreetMap.

One thing that is very nice from the Natural Earth side is knowing which few dozen to few hundred locality features to include at low zooms instead of stuffing the tiles with 1000s of OpenStreetMap features that can't all be labeled anyhow. This improves file size two fold by limiting the number of features included, and because each new low-zoom feature is highly likely to have lots of translations, limits unique (poorly compressed) name property values at low zooms. Limiting the number of features also improves rendering performance. (We also have too many OSM country labels with all their translations when they can't all label in the tiles now.)

There are a few options on the table:

  1. Keep using Natural Earth features but join them with OSM at run-time to add in missing properties. See comments above for how OpenMapTiles does it.
  2. Switch to OSM for all zooms and join with Natural Earth to get better min_zoom values.
  3. Add the translations for the 20 common Tilezen languages to Natural Earth, keeping NE at low-zooms and OSM and mid- and high-zooms. Needs a tiny bit of new Tilezen logic to carry thru the name:* properties.
  4. Switch to using Who's On First localities for all zooms, and enjoy the min_zoom properties that were copied over from Natural Earth (we currently use Who's On First just for neighbourhood names now). Until WOF has more global coverage this isn't feasible, though.
  5. Some combination of the above.

We've spent the summer investigating locality name translations in the low zooms (property values, not engineering options listed above) and will have a blog post up about that probably next week.

And for your original issue about the misspelled placename: Looks like we goofed up Łódź when fixing some of the egregious Windows-1252 <> UTF8 conversation errors via a database patch over the released 3.x version of Natural Earth. I've fixed that upstream in Natural Earth repo in this commit: nvkelso/natural-earth-vector@a2fac8b (on master, needs packaging for download links on site).

@ImreSamu
Copy link

There are a few options on the table:

  1. Keep using Natural Earth features but join them with OSM at run-time to add in missing properties.
    See comments above for how OpenMapTiles does it.

based on my experience with fixing OpenMapTiles similar algorithm , I would prefer the WIKIDATA based merge/join solutions.

so my suggestions:
6. Adding wikidata code for Natural Earth, and join with OSM via wikidata value. ( As I know Natural Earth does not contain wikidata information yet )
Maybe we can reuse the OpenMapTiles algorithm to generate wikidata code for Natural Earth database,

@nvkelso
Copy link
Member

nvkelso commented Aug 18, 2017 via email

@ImreSamu
Copy link

@nvkelso

Do you have a mapping to contribute, compare with?

I try to help,

I need a little time to analyze the best solution

  • but probably the first simple test report/mapping can be done in the ~ next weeks,

compare with?

IMHO: The OSM licensing issues the biggest problem ( ODBL vs. Public domain )

  • so I am thinking about a Natural Earth QA test.

Detecting/Checking:

  • Missing/different Wikidata codes ( and suggestions for manual editing )
  • Misspelled place names ( Similar problems like: "Lódz" )

Comment:
"Misspelled place names" / "Lódz","Pécs","Győr" : Now the OpenMapTiles quick&dirty fix for this type of problem is using unaccent() SQL functions for comparisons.
But this is not perfect.
At least in Hungary the unaccent(placenames) is not unique ID.

@matkoniecz
Copy link
Contributor

IMHO: The OSM licensing issues the biggest problem ( ODBL vs. Public domain )

Why? http://www.naturalearthdata.com/about/terms-of-use/ and wikidata are PD, so after mixing OSM with Wikidata and NE data license is not going to be a problem. One may simply use ODBL.

@nvkelso
Copy link
Member

nvkelso commented Aug 19, 2017

Many of the wikidata concordances are already available via Who's On First under CC0 – I'll investigate adding them to Natural Earth from there, with verification QA steps against OSM to make sure the joins there result in expected behavior, and do one-off corrections:

@nvkelso
Copy link
Member

nvkelso commented Aug 19, 2017

Tracking the Wikidata concordance work upstream in nvkelso/natural-earth-vector#214.

@klokan
Copy link

klokan commented Aug 21, 2017

+1 on bringing WikiData codes into the next Natural Earth release.

It would be truly amazing @nvkelso

@matkoniecz
Copy link
Contributor

It affects also water areas - for example Aral Sea ( http://www.openstreetmap.org/relation/2195612 ) has no Polish label on lower zoom levels.

@nvkelso
Copy link
Member

nvkelso commented Oct 30, 2017

Usually sea features would come from OSM and be translated, but that one is technically an inland lake and is being sourced from Natural Earth which only includes the English label. I'm tracking adding more localized names to non-populated places features in nvkelso/natural-earth-vector#224 as a stop gap – but really whatever solution we develop for general Wikidata ID concordance x-walk and name harvesting should also work generically for all placetypes.

@drewwilliams
Copy link

I’ve been working on a Mapzen-based project — http://majes.tc/toponyms (background here) — that shows native-language place names, and then displays a choice of localized name (if available). Was excited to see the WOF work expanding the number of name translations as I want to provide a long list of potential localizations.

This whole project is predicated on displaying local names for everything and I thought I’d chime in on this issue: The Bubble Wrap style on which I’ve based my map appears to use Natural Earth data for major city names (and some other features) at low zoom levels and these do not appear in their native language until they switch over to OSM at zoom level 8.

screen shot 2017-11-11 at 16 22 08 pastedgraphic-1

Is there a way to workaround this, by forcing OSM/WOF as a source so that native place names are shown at all zoom levels?

Unlike much of what you’re undoubtedly focused on, this is a side project with little or no commercial potential, but it’s one of my first map projects and I’ve been amazed at how easy Mapzen has made it to bang out this prototype, so thanks for that.

Cartographic rookie, but let me know if there’s anything I can do to help?

@nvkelso
Copy link
Member

nvkelso commented Nov 16, 2017

Hi @drewwilliams. Thanks for your feedback. We have some immediate improvements planned that will pick up ~8 languages for low zoom populated places from the new Natural Earth v4 release (Dec/Jan release). But that's only a stop gap until one of the other proposed solutions in #977 (comment) can be acted on.

@nvkelso nvkelso self-assigned this Nov 28, 2017
@nvkelso nvkelso changed the title Low zoom locality names are missing translations 🌍 Low zoom locality names are missing translations Nov 28, 2017
@nvkelso nvkelso assigned zerebubuth and unassigned nvkelso Jun 5, 2018
@ghost ghost added in review and removed nextnext labels Jun 13, 2018
@nvkelso
Copy link
Member

nvkelso commented Jun 14, 2018

Since OSM region to NE regions have poor mapping, even using Wikidata ID joins, let's investigate default min_zoom, max_zoom per country (with a table @nvkelso will provide) using the roads intercut method and then when there is a match use the curated value instead.

For the countries it seems to be good enough match rate we can fix up any Wikidata ID funk (like was done for Taiwan).

@nvkelso nvkelso changed the title 🌍 Low zoom locality names are missing translations 🌍 Low zoom locality names are missing translations + country, region zooms Jun 14, 2018
@nvkelso
Copy link
Member

nvkelso commented Jul 6, 2018

Verified on dev.

#858 Tracks adding WikipediaID later in v1.6 milestone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants