Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GeoJSON datasets #192

Merged
merged 4 commits into from
Oct 20, 2020
Merged

GeoJSON datasets #192

merged 4 commits into from
Oct 20, 2020

Conversation

jsanz
Copy link
Member

@jsanz jsanz commented Oct 7, 2020

Fixes #191

This PR includes the following new GeoJSON data sources, with the same precision that their already existing TopoJSON versions.

  • admin_regions_lvl2_v2.geo.json 25MB
  • usa_counties_v2.geo.json 2MB
  • usa_zip_codes_v7.geo.json 11MB
  • world_countries_v7.geo.json 11MB

Changes in existing admin regions and world countries are related to a small fix in the data pipeline to include proper IDs that I did as part of improving the Makefile.

For the World Countries, the new datasets are only available from 7.10, offering to the older versions the current published low precision dataset.

@jsanz jsanz added the data Data related issues and requests label Oct 7, 2020
@jsanz jsanz requested a review from nickpeihl October 7, 2020 14:41
@kibanamachine
Copy link

💚 Build Succeeded

@jsanz jsanz linked an issue Oct 8, 2020 that may be closed by this pull request
Copy link
Member

@nickpeihl nickpeihl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you double-check that you have the latest shapefiles from the Google drive? I'm seeing some missing regions such as AZ-LAC, MU-PU, and others in the admin_regions dataset.

There remains some inconsistencies when dissolving the regions into countries. For example, the overseas territories of France (such as French Guiana) are completely missing from world countries. Also in world countries, Taiwan has been merged into China.

Compare your PR to what's currently in staging.

vimdiff <(curl --compressed https://storage.googleapis.com/elastic-bekitzur-emsfiles-vector-dev/files/world_countries_v1.geo.json | jq '.features[].properties.iso2' | sort | uniq) <(jq '.features[].properties.iso2' < data/world_countries_v1.geo.json | sort | uniq)

sources/world/Makefile Outdated Show resolved Hide resolved
sources/world/Makefile Show resolved Hide resolved
sources/world/Makefile Show resolved Hide resolved
@jsanz
Copy link
Member Author

jsanz commented Oct 13, 2020

Can you double-check that you have the latest shapefiles from the Google drive? I'm seeing some missing regions such as AZ-LAC, MU-PU, and others in the admin_regions dataset.

Sorry about that, I will double check the data, thanks!!

@kibanamachine
Copy link

💚 Build Succeeded

@jsanz jsanz requested a review from nickpeihl October 15, 2020 17:10
@jsanz
Copy link
Member Author

jsanz commented Oct 15, 2020

@nickpeihl I've marked again this for review, hopefully data issues are addressed. I also changed the source for the world countries to only offer TopoJSON for >= 7.10.

Copy link
Member

@nickpeihl nickpeihl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

World countries look good to me.

Administrative regions are missing a few regions in Azerbaijan and Slovenia, but that's my fault because I did not rsync my local changes to the Google Drive. Sorry about that.

vimdiff <(curl --compressed https://vector.maps.elastic.co/files/admin_regions_lvl2_v1.geo.json | jq '.features[].properties.region_iso_code' | sort | uniq) <(jq '.features[].properties.region_iso_code' < data/admin_regions_lvl2_v1.geo.json | sort | uniq)

Getting rid of the ~99- regions is a good idea though. Thanks for doing that.

@kibanamachine
Copy link

💚 Build Succeeded

@jsanz jsanz requested a review from nickpeihl October 19, 2020 16:14
@jsanz
Copy link
Member Author

jsanz commented Oct 19, 2020

@nickpeihl thanks for updating the datasets, I've tested again the regions and now we don't have any missing records compared with current production datasets and the new ones are all under control either as current missing data, or new data as per processing to generate all countries.

Also, there are no diffs with the current production world countries dataset

diff \
 <(curl -s --compressed https://vector.maps.elastic.co/files/world_countries_v1.geo.json | jq '.features[].properties.iso2' | sort | uniq) \
 <(jq '.features[].properties.iso2' < data/world_countries_v1.geo.json | sort | uniq)

Copy link
Member

@nickpeihl nickpeihl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Code review and compared datasets in this PR to production

@jsanz jsanz merged commit 804bece into elastic:feature-layers Oct 20, 2020
@jsanz jsanz deleted the 191-geojson branch October 20, 2020 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data Data related issues and requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add full precision GeoJSON version for TopoJSON only datastes
3 participants