Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Names of Japan Prefectures are inconsistent with GeoLite2 subdivision names #20971

Closed
kosho opened this issue Jul 19, 2018 · 5 comments
Closed
Labels
[Deprecated-Use Team:Presentation]Team:Geo Former Team Label for Geo Team. Now use Team:Presentation Feature:Region Map

Comments

@kosho
Copy link

kosho commented Jul 19, 2018

Kibana version: 6.3.0

Elasticsearch version: 6.3.0

Describe the bug:

People often use GeoLite2 dataset to convert IP addresses to geological locations through Elasticsearch's geoip igest node processor and geoip filter of Logstash, however subdivision names GeoLite2 returns for prefectures in Japan are different from Elastic Maps Service. As a cause, Region Map visualization of Kibana is not able to paint regions properly.

I've extracted below prefecture names from here.

01,Hokkaido
02,Aomori
03,Iwate
04,Miyagi
05,Akita
06,Yamagata
07,Fukushima-ken
08,Ibaraki
09,Tochigi
10,Gunma
11,Saitama
12,Chiba
13,Tokyo
14,Kanagawa
15,Niigata
16,Toyama
17,Ishikawa
18,Fukui
19,Yamanashi
20,Nagano
21,Gifu
22,Shizuoka
23,Aichi
24,Mie
25,Shiga
26,Kyoto
27,"Ōsaka"
28,"Hyōgo"
29,Nara
30,Wakayama
31,Tottori
32,Shimane
33,Okayama
34,Hiroshima
35,Yamaguchi
36,Tokushima
37,Kagawa
38,Ehime
39,Kochi
40,Fukuoka
41,Saga
42,Nagasaki
43,Kumamoto
44,Oita
45,Miyazaki
46,Kagoshima
47,Okinawa

It seems GeoLite2 database doesn't follow any rule. It doesn't comply with ISO-3166 subdivision names. It also give -ken which means prefecture for Fukushima as the other prefectures don't have it.

Since Japanese words are flexible when using alphabets, Kyoto could be Kyōto. Using the ISO subdivision code to map prefectures is considerable. In that case, it must be fixed by Elasticsearch and Logstash while MaxMind describes such method here

Steps to reproduce:

  1. Index some documents with a Japanese prefecture name with a value like below. It could be through geoip filter of logstash or geoip ingest node process when IP addresses are available.
PUT japan-region-map/doc/_bulk
{"index":{}}
{  "Region": "Tokyo",  "Value": 1}
{"index":{}}
{  "Region": "Hokkaido",  "Value": 5}
  1. Create an index-pattern

  2. Create a new Region Map from Visualize tab. The message on below the screenshot appears.

Note: Elastic Maps Service expects Hokkaidō Prefecture instead of Hokkaido.

Expected behavior:

Region Map to properly paint the map while prefecture names are supplied from GeoLite2 dataset.

Screenshots (if relevant):

2018-07-19 18 14 22

Tag: @alexfrancoeur @nickpeihl

@kosho kosho added Feature:Add Data Add Data and sample data feature on Home Project:Accessibility Feature:Region Map and removed Feature:Add Data Add Data and sample data feature on Home Project:Accessibility labels Jul 19, 2018
@alexfrancoeur alexfrancoeur added the [Deprecated-Use Team:Presentation]Team:Geo Former Team Label for Geo Team. Now use Team:Presentation label Jul 19, 2018
@alexfrancoeur
Copy link

cc: @thomasneirynck

@alexfrancoeur
Copy link

thanks for filing @kosho, we'll look into this!

@nickpeihl
Copy link
Member

Thanks for checking on this. I agree we need to look into this more.

Names are difficult to use as a join field because they can differ in different data sources. I opened a PR on elasticsearch to add ISO 3166-2 codes to the geoip ingest plugin. This isn't an immediate solution but it would give us a standard code that we can join data to in future releases.

@kosho
Copy link
Author

kosho commented Jul 20, 2018

@alexfrancoeur @nickpeihl Thanks for your prompt response. A script field like doc['geoip.country_code2.raw'].value + doc['geoip.region_code.raw'].value solves the problem while ingested from Logstash's geoip filter. I am glad to hear you have already issued the PR to supply subdivision codes by the geoip ingest processor.

@nickpeihl
Copy link
Member

As a final note to this, we strongly discourage using region names as a join field. The solution above to use a scripted field with the country and region codes works for the Logstash GeoIP plugin. This blog post has a little more information.

Starting with v6.5, the Ingest GeoIP plugin will add a new geoip.region_iso_code field that can be joined to the ISO 3166-2 code in region maps without requiring a scripted field. elastic/elasticsearch#31669

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Deprecated-Use Team:Presentation]Team:Geo Former Team Label for Geo Team. Now use Team:Presentation Feature:Region Map
Projects
None yet
Development

No branches or pull requests

3 participants