Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Administrative divisions layer #178

Merged
merged 11 commits into from
Jul 29, 2020
Merged

Administrative divisions layer #178

merged 11 commits into from
Jul 29, 2020

Conversation

nickpeihl
Copy link
Member

@nickpeihl nickpeihl commented Jul 8, 2020

Fixes #177.

Administrative divisions

The Administrative divisions layer contains second level subdivisions (first level where no second level subdivision exists) of world countries. This layer was derived from the Admin 1 - States, Provinces layer from Natural Earth with supplemental boundaries from OpenStreetMap where Natural Earth data is known to be incomplete or erroneous.

When released, this layer will be available immediately in all releases of Kibana from v5.6+. The layer is published as a GeoJSON file for Kibana <v6.2. Starting with Kibana v6.2, a slightly higher resolution TopoJSON file is used.

This layer can be viewed at https://maps.elastic.co/?manifest=testing#file/administrative_divisions.

Test in Kibana

Instructions here.

Notes

The ISO 3166-2 codes in this layer have been verified against the MaxMind GeoLite2 database currently distributed with the latest releases of Elasticsearch. Some features in the Natural Earth dataset this layer is derived from use unofficial ISO codes or have no ISO codes at all.

This layer only offers one opinionated view of country boundaries. Generally, these boundaries are drawn according to de facto rather than de jure status.

License

Since this layer contains data from both Natural Earth (Public Domain) and OpenStreetMap (ODbL license) the more restrictive license (ODbL) applies to this dataset.

Any use of Elastic Maps Service data and APIs are subject to the Elastic Maps Service Terms of Service.

@kibanamachine
Copy link

💚 Build Succeeded

@thomasneirynck
Copy link
Contributor

thomasneirynck commented Jul 8, 2020

You can test in Kibana Maps by adding this to the kibana.dev.yml.

map.emsFileApiUrl: https://storage.googleapis.com/elastic-bekitzur-emsfiles-vector-dev/

Packetbeat with geoip plugin enabled is probably the easiest way to load data.
- download packetbeat: https://www.elastic.co/downloads/beats/packetbeat
- enable geo-ip plugin: https://www.elastic.co/guide/en/beats/packetbeat/master/packetbeat-geoip.html

Add an index-pattern: packetbeat-*

In Kibana, use the choroplth layer wizard to add the layer:

image

image

tada:

image

The nice thing about a region choropleth-layer like this is that the "unknown" continental-us lat/lon assignment does not get added.

e.g.: see that big cluster in KS, it's junk lat/lon assignment.
image

We could technically filter that out with a KQL-expression, maybe in ready-made APM-RUM and/or Beats layers

Copy link
Member

@jsanz jsanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested this in a local environment, the EMS Landing Page work as expected

image

but Kibana is refusing to load the geometries without any log on the browser console.

image

Maybe we can have a sync session to debug this, I added a couple of comments, but not really requests at this moment. So far the layer looks great!!

}
emsFormats: [{
type: geojson
file: 'admin_divisions_v2.geo.json'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this version won't be used by any clients by default, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's not used. The landing page shows "Download TopoJson" only and Kibana uses default format (TopoJson).

I actually tested the GeoJson-version, and it works fine in the GeoJson upload of Kibana Maps.

As a tangent: we should consider improving the landing-page so the "Download GeoJson" button is always "Download GeoJson" if there's a geojson-version available. Right now, it defaults to default format (in this case, topojson). Topojson isn't the greatest interchange format anyway, GeoJson is more shareable/hackable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that TopoJSON is not a very portable format.

Maybe an improvement could be to always add a GeoJSON equivalent file in the service and then favor it on the landing page as you suggest. In production, we only use TopoJSON at this moment for USA counties and zip codes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my intention was to include the geojson for future use such as ingest into Kibana by downloading from the landing page.

}
}
]
fieldMapping: [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we aim to add also the wikidata identifier? I know there are quite a few divisions that miss that field (almost 500) and I'm not sure how much that field is leveraged by our users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think related to the overall id question. Right now, I am not sure e.g. how useful it is to have the country-code in here. Will it ever be used for joining? if not, is there a longer term analytical purpose? (maybe towards hierarchical layers or something...)

How common is the wikidata identifier, is this something our user-base will have in their own business-data?

My gut feeling right now, is I think it's probably more useful to add the wikidata identifier than the country-identifier. (providing the wikidata-id is somewhat useful, and not just some obscure internal id with little outside use).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think related to the overall id question. Right now, I am not sure e.g. how useful it is to have the country-code in here. Will it ever be used for joining? if not, is there a longer term analytical purpose? (maybe towards hierarchical layers or something...)

I think it may be useful to have the country code when the data layer is ingested into Elasticsearch so that it can be used in filters. But maybe it doesn't make sense to expose the field in the join.

How common is the wikidata identifier, is this something our user-base will have in their own business-data?

It may not be very useful for joining to business data. Additionally, some divisions do not have a wikidata ID because I did not enforce it in my changes to the Natural Earth layer. 😬

Copy link
Contributor

@thomasneirynck thomasneirynck Jul 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my 2 cents, (and don't feel strongly about either, so for your consideration mostly).

  • would not include the wikidata-id. Just more content to maintain with questionable usefulness.
  • would not include the country-identifier. While it's true that it can be used for filtering, this can only be done if the data is ingested in Elasticsearch. This is not the most common use-case. There are work-arounds too: user can always write a manual filter (e.g. region_code: US-* or region_code: CA-* to filter about country. Having it included would complicate the field-types of EMS. It's not a unique identifier (ie, "type=id"), and neither is it useful for human readable labels ("type=label").

In either case, we can always add a new column later, without affecting users of this layer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. I've only included the region iso code and region name in the latest update.

@jsanz
Copy link
Member

jsanz commented Jul 8, 2020

Following @thomasneirynck suggestion on using the staging URL I got it working correctly so my previous comment was related with something on my environment.

Tested it using the Kibana Sample flights dataset and joining by the DestRegion field. 👌

image

Copy link
Contributor

@thomasneirynck thomasneirynck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really great!

Some comments.

Main thing imho is if we want to introduce these new semantics for id-fields, where they can be foreign keys.

I also think this layer opens the door for much tighter integration with ECS. I added elastic/kibana#71042

}
]
fieldMapping: [
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field-names/descriptions are kind of hard to wrap your head around when you see them displayed:

e.g.: The field-descriptions looks like in the Maps-UX

image

If we ingest this file into Kibana Maps (e.g. using GeoJson upload), and use the raw documents, we see the field-names.

image

I'd consider renaming and using the ECS names for these fields.

 {
          type: id
          name: region_iso_code
          desc: iso2 region code
}
{
          type: id
          name: country_iso_code
          desc: iso2 country code
}

[EDIT] nevermind, I see that we already made some changes. Just wondering if we should be using the ECS-names.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Field names and descriptions now match the ECS names.

{
type: id
name: iso2
desc: Country code
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Country code is the first id-field we're introducing in EMS that is not a unique identifier.

The joining-code in Kibana Maps works fine enough.

e.g. join on country-code. It captures all the individual regions.

image

image

The logic also works for region maps.

image

There's some edge-case, e.g. the display shows internal borders.

I guess the overall question is. Do we want to make this a new semantic of the id-field? ie. a field that uniquely identifies some entity, but it can be both a primary key (ie. the region-code) or a foreign key (ie. the country-code).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ this behavior of the inner-join code is not unit-tested in Kibana (all tests expect a primary key in the left-hand side "table").

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to make broad changes here. I've removed the country code in the latest update.

desc: Division name
}
]
name: 'administrative_divisions'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider renaming to Administrative regions, or 2nd level regions, ... Something to tie it to ECS more closely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also prefer something that is close to the wording we see in ECS and the results of the geoip processor. +1 for Administrative regions

}
emsFormats: [{
type: geojson
file: 'admin_divisions_v2.geo.json'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's not used. The landing page shows "Download TopoJson" only and Kibana uses default format (TopoJson).

I actually tested the GeoJson-version, and it works fine in the GeoJson upload of Kibana Maps.

As a tangent: we should consider improving the landing-page so the "Download GeoJson" button is always "Download GeoJson" if there's a geojson-version available. Right now, it defaults to default format (in this case, topojson). Topojson isn't the greatest interchange format anyway, GeoJson is more shareable/hackable.

@jsanz
Copy link
Member

jsanz commented Jul 8, 2020

Update on the issue loading this layer. I could reproduce the issue just enabling the proxy. Running Kibana from 7.8, 7.7, and 7.6 branches with these settings gives the same error I shared

map.proxyElasticMapsServiceInMaps: true
map.emsFileApiUrl: https://storage.googleapis.com/elastic-bekitzur-emsfiles-vector-dev

Peek 2020-07-08 15-40

No traces in the server or browser logs. This indeed looks like some malfunction in the proxy.

@kibanamachine
Copy link

💚 Build Succeeded

@nickpeihl
Copy link
Member Author

Update on the issue loading this layer. I could reproduce the issue just enabling the proxy. Running Kibana from 7.8, 7.7, and 7.6 branches with these settings gives the same error I shared

map.proxyElasticMapsServiceInMaps: true
map.emsFileApiUrl: https://storage.googleapis.com/elastic-bekitzur-emsfiles-vector-dev

Peek 2020-07-08 15-40

No traces in the server or browser logs. This indeed looks like some malfunction in the proxy.

It definitely looks like the proxy does not like having multiple formats. Even though the topojson format is the default in the manifest, the proxy returning the geojson format. I'm still trying to determine if that's a bug in the ems-client fileLayer.getDefaultFormat method or somewhere else in the proxy server.

@nickpeihl
Copy link
Member Author

This layer contains second-level administrative regions with ISO 3166-2 codes. However many countries only have first-level regions and those regions are included here.

Should we also expect to release a layer for strictly first-level administrative regions? A first-level administrative regions layer would be largely identical to this one, except in some countries such as France, Italy, et al which have both first-level and second-level regions.

If so, perhaps we need to rename this layer to be more specific to the region level. Administrative Regions - Second Level?

@kibanamachine
Copy link

💚 Build Succeeded

@nickpeihl
Copy link
Member Author

nickpeihl commented Jul 15, 2020

With c6abbed I am no longer seeing the error when using the layer via proxy. @jsanz can you confirm?

There appears to be a bug in Kibana where the proxied manifest does not handle multiple formats correctly.

@jsanz
Copy link
Member

jsanz commented Jul 16, 2020

@jsanz can you confirm?

I can confirm, the layer manifest now reflects the correct format for the dataset and Kibana Maps loads it without issues.

Thanks for reporting the bug also! 💪

This leaves us with the ability to add a level 1 regions layer in the future if we want to.
@kibanamachine
Copy link

💚 Build Succeeded

@nyurik
Copy link
Contributor

nyurik commented Jul 16, 2020

Per skype discussion, I think we should include disputed regions here as overlapping shapes. Topojson makes this very efficient, essentially storing each shape just once.

For example, TW (Taiwan -- country level as having no subdivisions) and CN-TW Taiwan Province would both be included as the same (or at least overlapping) shape with different metadata.

This will ensure that user's data really drives the (choropleth) visualization -- using whichever codes it actually contains. Note that if we pick just one shape, we risk user's data not being shown at all. I think it is unlikely for the user's data to contain both IDs for the same region, and even if it does, it is better to show a strange coloration than no coloration at all (or an error of a missing region).

@nickpeihl
Copy link
Member Author

nickpeihl commented Jul 16, 2020

Per skype discussion, I think we should include disputed regions here as overlapping shapes. Topojson makes this very efficient, essentially storing each shape just once.

For example, TW (Taiwan -- country level as having no subdivisions) and CN-TW Taiwan Province would both be included as the same (or at least overlapping) shape with different metadata.

This will ensure that user's data really drives the (choropleth) visualization -- using whichever codes it actually contains. Note that if we pick just one shape, we risk user's data not being shown at all. I think it is unlikely for the user's data to contain both IDs for the same region, and even if it does, it is better to show a strange coloration than no coloration at all (or an error of a missing region).

Thanks Yuri. Looks like the provinces of Macau and Hong Kong are also missing. This was all fixed in 5e91330.

Additionally, this removes the divisions of Macau and Hong Kong which
have no ISO codes. CN-TW and the provinces of Taiwan overlap but one
should not expect data from both regions to be displayed simultaneously.
@nickpeihl nickpeihl requested a review from jsanz July 17, 2020 22:31
@kibanamachine
Copy link

💚 Build Succeeded

Copy link
Member

@jsanz jsanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked with some sample data that I run through the geoip processor to get region ISO codes from all over the world and it's looking great! I double checked on the some of the missing regions and it was congruent with existing data.

image

I've also tested on 6.8 with the Region Map visualization and it loads the TopoJSON file without issues.

image

@nickpeihl
Copy link
Member Author

Merging this now. But we will release this to production at a later date.

@nickpeihl nickpeihl merged commit 3348854 into master Jul 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Natural Earth Provinces and States layer
5 participants