Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add region ISO code to GeoIP Ingest plugin #31669

Merged
merged 1 commit into from
Jul 20, 2018
Merged

Add region ISO code to GeoIP Ingest plugin #31669

merged 1 commit into from
Jul 20, 2018

Conversation

nickpeihl
Copy link
Member

The @elastic/kibana-gis team is providing vector layer boundaries of country subdivisions for Kibana region maps. These subdivision layers can be joined to data in Elasticsearch via the ISO 3166-2 code. This PR adds the ISO 3166-2 region code to IP addresses when using Ingest Geoip plugin.

With this enhancement, users will be able to drill down their IP logs into country subdivisions. The screenshot below is an example of a Kibana dashboard showing IP address logs from Canadian provinces, German states, and Irish counties.

screen shot 2018-06-27 at 16 43 12

For example, this is a snip of a vector layer for a state in Germany.

...
{ 
    "type": "Feature",
    "id": "Q64",
    "properties": {
      "iso_3166_2": "DE-BE",
      "label_en": "Berlin",
      "label_de": "Berlin"
    },
   "geometry": {
      ...
    }
}
...

A single IP log document ingested with this PR in Elasticsearch may look like this. Where the only change is the geoip.region_iso_code is added. This can be joined to the Germany States vector layer in a Kibana region map as seen in the screenshot above.

...
{
  "found": true,
  "_index": "my_index",
  "_type": "my_type",
  "_id": "my_id",
  "_version": 1,
  "_source": {
    "ip": "2.16.6.0",
    "geoip": {
      "continent_name": "Europe",
      "country_iso_code": "DE",
      "location": { "lat": 52.486, "lon": 13.3271 },
      "region_iso_code": "DE-BE", 
      "region_name": "Berlin"
    }
    ...
  }
}
...

Thanks for your consideration.

@jasontedor jasontedor added the :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP label Jun 29, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I left one minor comment.

StringJoiner regionIsoCode = new StringJoiner("-");
regionIsoCode.add(countryIso);
regionIsoCode.add(subdivisionIso);
geoData.put("region_iso_code", regionIsoCode.toString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just: countryIso + "-" + subdivisionIso ?

@nickpeihl
Copy link
Member Author

Thanks @martijnvg. I've simplified it as you suggested.

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few questions.

@@ -73,6 +73,10 @@
"location": {
"type": "geo_point"
},
"region_iso_code": {
"ignore_above": 1024,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why ignore_above is set on this or the other iso code field, but it should not be needed (for city/region name it might make sense since they could be particularly long depending on the language).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right. I will change this.

String countryIso = country.getIsoCode();
String subdivisionIso = subdivision.getIsoCode();
if (countryIso != null && subdivisionIso != null) {
String regionIsoCode = countryIso + "-" + subdivisionIso;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of combining these and creating our own unique "region" iso code (which I don't think is actually a thing?) can we expose the subdivision by itself? The rest of the geoip data is exposed directly from the database without alteration or manipulation, and it would be nice to keep that simplicity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concatenated code is a real ISO code. For example, here are the Canadian provinces. For some reason the subdivision code exposed by the GeoLite database nixes the country code portion of the full code (AB for Alberta, Canada instead of CA-AB).

Including the full region ISO code would be imperative to inner join to boundary layers in Kibana region maps. This is because the full ISO code (ex. 'IE-C' for Connacht, Ireland, 'ES-C' for A Coruña, Spain) is distinct between countries whereas the portion exposed by the GeoLite database (ex. 'C') is not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nickpeihl agreed, the concatenation is a real code and uniquely identifies an administrative region within a country. It would be very useful to have this exposed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the info, I see the specification now. Can you please add a comment that this is an ISO 3166-2 code?

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for removing ignore_above. I think all the uses of ignore_above should be reviewed there (separately from this PR) as I see some others that probably don't need it (eg remote_ip, which probably also should be an ip type instead of keyword).

This looks good, with the exception that the docs for this plugin should be updated to note this new field being available.

String countryIso = country.getIsoCode();
String subdivisionIso = subdivision.getIsoCode();
if (countryIso != null && subdivisionIso != null) {
String regionIsoCode = countryIso + "-" + subdivisionIso;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the info, I see the specification now. Can you please add a comment that this is an ISO 3166-2 code?

@nickpeihl
Copy link
Member Author

Hi all. Is there anything else that might hold this up? I've updated the docs based on feedback by @rjernst.

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay @nickpeihl . This looks good.

@nickpeihl nickpeihl merged commit ac63408 into elastic:master Jul 20, 2018
martijnvg added a commit that referenced this pull request Jul 21, 2018
* es/master: (23 commits)
  Switch full-cluster-restart to new style Requests (#32140)
  [DOCS] Clarified that you must remove X-Pack plugin when upgrading from pre-6.3. (#32016)
  Remove BouncyCastle dependency from runtime (#32193)
  INGEST: Extend KV Processor (#31789) (#32232)
  INGEST: Make a few Processors callable by Painless (#32170)
  Add region ISO code to GeoIP Ingest plugin (#31669)
  [Tests] Remove QueryStringQueryBuilderTests#toQuery class assertions (#32236)
  Make sure that field aliases count towards the total fields limit. (#32222)
  Switch rolling restart to new style Requests (#32147)
  muting failing test for internal auto date histogram to avoid failure before fix is merged
  MINOR: Remove unused `IndexDynamicSettings` (#32237)
  Fix multi level nested sort (#32204)
  Enhance Parent circuit breaker error message (#32056)
  [ML] Use default request durability for .ml-state index (#32233)
  Remove indices stats timeout from monitoring docs
  Rename ranking evaluation response section (#32166)
  Dependencies: Upgrade to joda time 2.10 (#32160)
  Remove aliases resolution limitations when security is enabled (#31952)
  Ensure that field aliases cannot be used in multi-fields. (#32219)
  TESTS: Check for Netty resource leaks (#31861)
  ...
ruflin added a commit to ruflin/beats that referenced this pull request Jul 23, 2018
In elastic/elasticsearch#31669 the field `region_iso_code` was added to the geoip processor. Because of this test broke with the most recent release of Elasticsearch as the events contain an undocumented field.
tsg pushed a commit to elastic/beats that referenced this pull request Jul 23, 2018
In elastic/elasticsearch#31669 the field `region_iso_code` was added to the geoip processor. Because of this test broke with the most recent release of Elasticsearch as the events contain an undocumented field.
tsg pushed a commit to elastic/beats that referenced this pull request Jul 24, 2018
* Fix breaking change in monitoring data (#7563)

The prefix for the stats metrics was metrics but renamed to `stats` by accident as the name is now auto generated. This reverts this change.

Closes #7562

* Add http.request.mehod to Kibana log filset (#7607)

Take `http.request.method` from ECS and apply it to the Kibana fileset.

Additional logs are added to the example log files.

* Fix rename log message (#7614)

Instead of the from field the to field was logged.

* Add tests to verify template content (#7606)

We recently started to move fields.yml into the Golang binary to be used internally. To make sure the loading important and loading of all the data into the binary works as expected for Metricbeat, this adds some basic tests. Related to #7605.

* Basic support of ES GC metrics for jvm9 (#7628)

GC log format for JVM9 is more detailed than for JVM8.

Differences and possible improvements:
* To get cpu_times.* a corellation between log lines is required.
* Some GC metrics are available in jvm8 are not in jvm9
  (class_unload_time_sec, weak_refs_processing_time_sec, ...)
* heap.used_kb is empty, but it can be calculated as young_gen.used_kb +
  old_gen.size_kb
* GC phase times are logged in miliseconds vs seconds in jvm8

* Improve fields.yml generator of modules (#7533)

From now on when a user provides a type hint in an Ingest pipeline, it's added to the generated `fields.yml` instead of guessing.

Closes #7472

* Fix filebeat registry meta being nil vs empty (#7632)

Filebeat introduces a meta field to registry entries in 6.3.1. The meta field is used to distuingish different log streams in docker files. For other input types the meta field must be null. Unfortunately the input loader did initialize the meta field with an empty dictionary. This leads to failing matches of old and new registry entries. Due to the match failing, old entries will not be removed, and filebeat will handle all files as new files on startup (old logs are send again).

Users will observe duplicate entries in the reigstry file. One entry with "meta": null and one entry with "meta": {}. The entry with "meta": {} will be used by filebeat. The null-entry will not be used by filebeat, but is kept in the registry file, cause it has now active owner (yet).

Improvements provided by this PR:

* when matching states consider an empty map and a null-map to be equivalent
* update input loader to create a null map for old state -> registry entries will be compatible on upgrade
* Add checks in critical places replacing an empty map with a null-map
* Add support to fix registry entries on load. states from corrupted 6.3.1 files will be merged into one single state on load 
* introduce unit tests for loading different registry formats
* introduce system tests validating output and registry when upgrading filebeat from an older version

Closes: #7634

* Heartbeat Job Validation + addition of libbeat/mapval (#7587)

This commit seeks to establish a pattern for testing heartbeat jobs. It currently tests the HTTP and TCP jobs. It also required some minor refactors of those tasks for HTTP/TCP.

To do this, it made sense to validate event maps with a sort of schema library. I couldn't find one that did exactly what I wanted here, so I wrote one called mapval. That turned out to be a large undertaking, and is now the majority of this commit. Further tests need to be written, but this commit is large enough as is.

One of the nicest things about the heartbeat architecture is the dialer chain behavior. It should be the case that any validated protocol using TCP (e.g. HTTP, TCP, Redis, etc.) has the exact same tcp metadata.

To help make testing these properties easy mapval lets users compose portions of a schema into a bigger one. In other words, you can say "An HTTP response should be a TCP response, with the standard monitor data added in, and also the special HTTP fields". Even having only written a handful of tests this has uncovered some inconsistencies there, where TCP jobs have a hostname, but HTTP ones do not.

* Only fetch shard metrics from master node (#7635)

This PR makes it so that the `elasticsearch/shard` metricset only fetches information from the Elasticsearch node if that node is the master node.

* Create (X-Pack Monitoring) stats metricset for Kibana module (#7525)

This PR takes the `stats` metricset of the `kibana` Metricbeat module and makes it ship documents to `.monitoring-kibana-6-mb-%{YYYY.MM.DD}` indices, while preserving the current format/mapping expected by docs in these indices. This will ensure that current consumers of the data in these indices, viz. the X-Pack Monitoring UI and the Telemetry shipping module in Kibana, will continue to work as-is.

* Add kubernetes specs for auditbeat file integrity monitoring (#7642)

* Release the rename processor as GA

* Fix log message for Kibana beta state (#7631)

From copy paste Kafka was in the log message instead of Kibana.

* Clean up experimental and beta messages (#7659)

Sometimes the old logging mechanism was used. If all use the new one it is easier to find all the entries. In addition some messages were inconsistent.

* Release raid and socket metricset from system module as GA (#7658)

* Release raid and socket metricset from system module as GA

* remove raid metricset title

* Update geoip config docs (#7640)

* Document  breaking change in monitoring shcema

Situation:

* Edit breaking changes statement about monitoring schema changes (#7666)

* Marking Elasticsearch module and its metricsets as beta (#7662)

This PR marks the `elasticsearch` module and all its 8 existing metricsets all as `beta`. Previously only 
2 metricsets were marked as `beta` with the remaining 6 marked as `experimental`.

* Increase kafka version in tests to 1.1.1 (#7655)

* Add missing mongodb status fields (#7613)

Add `locks`, `global_locks`, `oplatencies` and `process` fields to `status` metricset of MongoDB module.

* Remove outdated vendor information. (#7676)

* Fix Filebeat tests with new region_iso_code field (#7678)

In elastic/elasticsearch#31669 the field `region_iso_code` was added to the geoip processor. Because of this test broke with the most recent release of Elasticsearch as the events contain an undocumented field.

* Fix duplicated module headers (#7650)

* Fix duplicated module headers

Closes #7643

* fix metricset titles for munin and kvm

* fix imssing kubernetes apiserver metricset doc

* remove headers from modules / metricset generator and clean up traefik title

* Release munin and traefik module as beta. (#7660)

* Release munin and treafik module as beta.

* fixes to munin module

* Report k8s pct metrics from enrichment process (#7677)

Instead of doing it from the `state_container`. Problem with the
previous approach is that `state_container` metricset is not run in all
nodes, but from a single point. Making performance metrics not available
in all cases.

With this new approach, the enriching process will also collect
performance metrics, so they should be available everywhere where the
module is run.

* Fix misspell in Beats repo (#7679)

Running `make misspell`.

* Update sarama (kafka client) to 1.17 (#7665)

- Update Sarama to 1.17. The Sarama testsuite tests kafka versions between 0.11 and 1.1.0.
- Update compatible versions in output docs
- Add compression_level setting for gzip compression

* Update github.com/OneOfOne/xxhash to fix mips

* Update boltdb to use github.com/coreos/bbolt fork

Closes #6052

* Generate fields.yml using Mage (#7670)

Make will now delegate to mage for generating fields.yml. Make will check if the mage command exists and go install it if not. The FIELDS_FILE_PATH make variable is not longer used because the path(s) are specified in magefile.go.

This allows fields.yml to be generated on Windows using Mage. The CI scripts for Windows have been updated so that fields.yml is generated for all Beats during testing.

This also adds a make.bat in each directory where building occurs to give Windows
users a starting point.

Some fixes were made to the generators because:
- rsync was excluding important source files contained in a directory
  named "build"
- the generated project needed to be `git init` before running certain
  magefile targets that detect project's root dir and import path.

* Update go-ucfg to 0.6.1 (#7599)

Update fixes config unpacking if users overwrite settings from CLI, with
missing values. When using `-E key=` (e.g. in scripts defining potential
empty defaults via env variables like `-E key=${MYVALUE}`), an untyped
`nil`-values was inserted into the config. This untyped value will make
Unpack fail for most typed settings.

* Docs: Add deprecation check for dashboard loading. (#7675)

For APM Server the recommended way of loading dashboards and Kibana index pattern will be through the Kibana UI from 6.4 on. Since the docs are based on the libbeat docs we need to add a deprecation flag for dashboard and index pattern related documentation.

relates to elastic/apm-server#1142

* Update expected filebeat module files for geoip change
@nickpeihl nickpeihl deleted the geoip-region-iso branch August 30, 2018 15:34
@nickpeihl nickpeihl restored the geoip-region-iso branch August 30, 2018 16:00
@nickpeihl nickpeihl added v6.5.0 and removed v6.4.0 labels Aug 30, 2018
dnhatn added a commit that referenced this pull request Sep 1, 2018
* 6.x:
  Mute test watcher usage stats output
  [Rollup] Fix FullClusterRestart test
  TEST: Disable soft-deletes in ParentChildTestCase
  TEST: Disable randomized soft-deletes settings
  Integrates soft-deletes into Elasticsearch (#33222)
  drop `index.shard.check_on_startup: fix` (#32279)
  Fix AwaitsFix issue number
  Mute SmokeTestWatcherWithSecurityIT testsi
  [DOCS] Moves ml folder from x-pack/docs to docs (#33248)
  TEST: mute more SmokeTestWatcherWithSecurityIT tests
  [DOCS] Move rollup APIs to docs (#31450)
  [DOCS] Rename X-Pack Commands section (#33005)
  Fixes SecurityIntegTestCase so it always adds at least one alias (#33296)
  TESTS: Fix Random Fail in MockTcpTransportTests (#33061) (#33307)
  MINOR: Remove Dead Code from PathTrie (#33280) (#33306)
  Fix pom for build-tools (#33300)
  Lazy evaluate java9home (#33301)
  SQL: test coverage for JdbcResultSet (#32813)
  Work around to be able to generate eclipse projects (#33295)
  Different handling for security specific errors in the CLI. Fix for #33230 (#33255)
  [ML] Refactor delimited file structure detection (#33233)
  SQL: Support multi-index format as table identifier (#33278)
  Enable forbiddenapis server java9 (#33245)
  [MUTE] SmokeTestWatcherWithSecurityIT flaky tests
  Add region ISO code to GeoIP Ingest plugin (#31669) (#33276)
  Don't be strict for 6.x
  Update serialization versions for custom IndexMetaData backport
  Replace IndexMetaData.Custom with Map-based custom metadata (#32749)
  Painless: Fix Bindings Bug (#33274)
  SQL: prevent duplicate generation for repeated aggs (#33252)
  TEST: Mute testMonitorClusterHealth
  Fix serialization of empty field capabilities response (#33263)
  Fix nested _source retrieval with includes/excludes (#33180)
  [DOCS] TLS file resources are reloadable (#33258)
  Watcher: Ensure TriggerEngine start replaces existing watches (#33157)
  Ignore module-info in jar hell checks (#33011)
  Fix docs build after #33241
  [DOC] Repository GCS ADC not supported (#33238)
  Upgrade to latest Gradle 4.10  (#32801)
  Fix/30904 cluster formation part2 (#32877)
  Move file-based discovery to core (#33241)
  HLRC: add client side RefreshPolicy (#33209)
  [Kerberos] Add unsupported languages for tests (#33253)
  Watcher: Reload properly on remote shard change (#33167)
  Fix classpath security checks for external tests. (#33066)
  [Rollup] Only allow aggregating on multiples of configured interval (#32052)
  Added deprecation warning for rescore in scroll queries (#33070)
  Apply settings filter to get cluster settings API (#33247)
  [Rollup] Re-factor Rollup Indexer into a generic indexer for re-usability   (#32743)
  HLRC: create base timed request class (#33216)
  HLRC: Use Optional in validation logic (#33104)
  Painless: Add Bindings (#33042)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement v6.5.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants