Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Wikidata concordances and names for more feature classes #224

Closed
8 tasks
nvkelso opened this issue Sep 30, 2017 · 14 comments
Closed
8 tasks

Add Wikidata concordances and names for more feature classes #224

nvkelso opened this issue Sep 30, 2017 · 14 comments
Milestone

Comments

@nvkelso
Copy link
Owner

nvkelso commented Sep 30, 2017

Following up from #214 which added the same for the 7k populated places... Let's do the same for other placetypes! Also adds names in more languages.

To start of the conversation:

  • admin0 "countries" human-verified
  • admin1 "states" - machine-guess
  • rivers
  • lakes
  • playas
  • physical labels (like continents, oceans, mountains, marine areas)
  • parks
  • airports
  • ports
  • redux on populated places with new matches and new names (21 now, not ~7) - human-machine-guess
@planemad
Copy link

Wonder if we can use openrefine to match with Wikidata https://github.com/wetneb/openrefine-wikidata

@ImreSamu
Copy link
Collaborator

@planemad :
imho:
openrefine: it can be used for admin0/admin1/airports matching, because

  • countries/states wikidata information probably complete and good quality
  • ~ low numbers ( ideal for human matching )
  • no need special scoring ( we have ISO code, IATA code )

for the 7K populated places - I have used handcrafted scoring+matching + google sheet as a GUI

@planemad
Copy link

planemad commented Oct 31, 2017

Countries

It worked! Matched the 177 country names to Wikidata using openrefine in under 20 mins.

Sheet: Natural EarthV4 Countries: Name Wikidata Match

Process

  • Export 110m_admin_0 countries shape to a CSV
  • Open CSV in openrefine
  • Select NAME_LONG column and Reconcile>Start Reconciling
  • Choose to Reconcile with the Wikidata Country property with the ISO 3166 A3 code as fallback
    screen shot 2017-10-31 at 3 27 45 pm
  • Wait 5 mins for matches. Only 5 failed. Took a few minutes to find the matches manually through the interface
  • Export the matched ids to a new columns using the expression cell.recon.match.id
    screen shot 2017-10-31 at 3 43 54 pm

Reference: Reconcile Wikidata Tutorial

@planemad
Copy link

planemad commented Nov 1, 2017

Airports

This one took a couple of hours:

  • 93%+ match based on iata code and name
  • ~40 were matched manually, some due to duplicate Wikidata and incorrect/outdated NE data
  • 4 airports have no matches
    screen shot 2017-11-01 at 6 39 00 pm
  • This might need some human 👀 to update NE data. Some airports have ben closed, renamed have typos or are just non existent like "Munich Freight"
  • Have added some additional columns for validation data validation in the sheet based on the name batch between NE and Wikidata.

Sheet: Airport IATA Name Wikidata match

@nvkelso nvkelso changed the title Add Wikidata and Who's On First concordances for more feature classes Add Wikidata concordances and names for more feature classes Jan 30, 2018
@nvkelso
Copy link
Owner Author

nvkelso commented Jan 30, 2018

Splitting Who's On First concordances off from this issue to keep it focused...

@ImreSamu
Copy link
Collaborator

ImreSamu commented Jan 30, 2018

Wikidata - to lakes,rivers,mountains: worksheet ( anybody can comment )
https://docs.google.com/spreadsheets/d/1oNV4ydEXXgbeowQT2xdUMi_4crN6gdthrx6hzeigkMQ/edit?usp=sharing

@ImreSamu
Copy link
Collaborator

ImreSamu commented Feb 6, 2018

  • admin_0_disputed_areas
  • admin_0_map_subunits
  • admin_1_states_provinces

https://docs.google.com/spreadsheets/d/1tgJnLdZOeYf_uWk8L6TXjOORJOcHBNHsVxfcBUB30Rs/edit?usp=sharing

@ImreSamu
Copy link
Collaborator

ImreSamu commented Feb 9, 2018

  • ne_10m_admin_0_countries - 255 countries

https://docs.google.com/spreadsheets/d/1V7p2RrqIcPqjTvdT9cbzze2QcRa5H4AS-fJElY4gAnM/edit?usp=sharing

cross-checked with @planemad 110m_admin_0 countries (177 matching )

@ImreSamu
Copy link
Collaborator

  • ne_10m_airports

https://docs.google.com/spreadsheets/d/1hegMTz6i0poQPi2zKb4rey0uf2j_CpK-EMOgj0Xg300/edit?usp=sharing

This might need some human eyes to update NE data. ...

cross-checked with @planemad results.

  • I have found only ~38 differences, and after manually checking every difference,

Summary: it was not an easy task, and the Wikidata airport data is not so perfect yet. :(

@ImreSamu
Copy link
Collaborator

@ImreSamu
Copy link
Collaborator

@ImreSamu
Copy link
Collaborator

ImreSamu commented Mar 1, 2018

@nvkelso
Copy link
Owner Author

nvkelso commented Mar 1, 2018

Started a PR for this here #249.

@nvkelso
Copy link
Owner Author

nvkelso commented May 24, 2018

This has been released on the public download site: https://www.naturalearthdata.com/blog/miscellaneous/natural-earth-v4-1-0-release-notes/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants