Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename geoip to location #50

Closed
ruflin opened this issue Jul 17, 2018 · 15 comments
Closed

Rename geoip to location #50

ruflin opened this issue Jul 17, 2018 · 15 comments
Labels

Comments

@ruflin
Copy link
Member

ruflin commented Jul 17, 2018

The current geoip fields (https://github.com/elastic/ecs#geoip) are inspired by the geoip ingest processor form Elasticsearch (https://www.elastic.co/guide/en/elasticsearch/plugins/6.2/using-ingest-geoip.html). In recent discussion around geo fields it became clear not all geo information is coming necessarly from an ip address and there is also geo / location information which does not necessarly can be extracted from an ip.

I would like to suggest to rename geoip fields to location. I was also thinking of renaming it to geo but I think location is more specific.

@ruflin ruflin added the discuss label Jul 17, 2018
ruflin added a commit to ruflin/ecs that referenced this issue Jul 17, 2018
There have been recently several discussions around source, destination and connection recently, especially in elastic#9. The conclusion from my side is that source and destination normally belongs to a connection and we actually miss a connection prefix. Also some information from network like `forward_ip` more belong to a connection.

An additional change I made to source and destination is that they both contain now a host prefix. All the fields in source and destination also exist in `host`. The host prefix can be reused here too. This makes ECS very predictable that every time `host.*` shows up it will contain the same fields. Also source and destination could contain additional data like the location, see elastic#50 for more details.

The connection fields now look as following:

```
| Field  | Description  | Type  | Multi Field  | Example  |
|---|---|---|---|---|
| <a name="connection.destination.host.ip"></a>`connection.destination.host.ip`  | IP address of the destination.<br/>Can be one or multiple IPv4 or IPv6 addresses.  | ip  |   |   |
| <a name="connection.destination.host.name"></a>`connection.destination.host.name`  | Hostname of the destination.  | keyword  |   |   |
| <a name="connection.destination.host.port"></a>`connection.destination.host.port`  | Port of the destination.  | long  |   |   |
| <a name="connection.destination.host.mac"></a>`connection.destination.host.mac`  | MAC address of the destination.  | keyword  |   |   |
| <a name="connection.destination.host.domain"></a>`connection.destination.host.domain`  | Destination domain.  | keyword  |   |   |
| <a name="connection.destination.host.subdomain"></a>`connection.destination.host.subdomain`  | Destination subdomain.  | keyword  |   |   |
| <a name="connection.source.host.ip"></a>`connection.source.host.ip`  | IP address of the source.<br/>Can be one or multiple IPv4 or IPv6 addresses.  | ip  |   |   |
| <a name="connection.source.host.name"></a>`connection.source.host.name`  | Hostname of the source.  | keyword  |   |   |
| <a name="connection.source.host.port"></a>`connection.source.host.port`  | Port of the source.  | long  |   |   |
| <a name="connection.source.host.mac"></a>`connection.source.host.mac`  | MAC address of the source.  | keyword  |   |   |
| <a name="connection.source.host.domain"></a>`connection.source.host.domain`  | Source domain.  | keyword  |   |   |
| <a name="connection.source.host.subdomain"></a>`connection.source.host.subdomain`  | Source subdomain.  | keyword  |   |   |
| <a name="connection.direction"></a>`connection.direction`  | Direction of the network traffic.<br/>Recommended values are:<br/>  * inbound<br/>  * outbound<br/>  * unknown  | keyword  |   | `inbound`  |
| <a name="connection.forwarded_ip"></a>`connection.forwarded_ip`  | Host IP address when the source IP address is the proxy.  | ip  |   | `192.1.1.2`  |
```

I opened a PR to discuss this instead of an issue as it will allow us to discuss the high level parts as comment but also details directly in the code.
ruflin added a commit to ruflin/ecs that referenced this issue Jul 17, 2018
There have been recently several discussions around source, destination and connection recently, especially in elastic#9. The conclusion from my side is that source and destination normally belongs to a connection and we actually miss a connection prefix. Also some information from network like `forward_ip` more belong to a connection then network.

An additional change I made to source and destination is that they both contain now a host prefix. All the fields in source and destination also exist in `host`. The host prefix can be reused here too. This makes ECS very predictable that every time `host.*` shows up it will contain the same fields. Also source and destination could contain additional data like the location, see elastic#50 for more details.

The connection fields now look as following:

| Field  | Description  | Type  |
|---|---|---|---|---|
| <a name="connection.destination.host.ip"></a>`connection.destination.host.ip`  | IP address of the destination.<br/>Can be one or multiple IPv4 or IPv6 addresses.  | ip  |
| <a name="connection.destination.host.name"></a>`connection.destination.host.name`  | Hostname of the destination.  | keyword  |
| <a name="connection.destination.host.port"></a>`connection.destination.host.port`  | Port of the destination.  | long  |
| <a name="connection.destination.host.mac"></a>`connection.destination.host.mac`  | MAC address of the destination.  | keyword  |
| <a name="connection.destination.host.domain"></a>`connection.destination.host.domain`  | Destination domain.  | keyword  |
| <a name="connection.destination.host.subdomain"></a>`connection.destination.host.subdomain`  | Destination subdomain.  | keyword  |
| <a name="connection.source.host.ip"></a>`connection.source.host.ip`  | IP address of the source.<br/>Can be one or multiple IPv4 or IPv6 addresses.  | ip  |
| <a name="connection.source.host.name"></a>`connection.source.host.name`  | Hostname of the source.  | keyword  |
| <a name="connection.source.host.port"></a>`connection.source.host.port`  | Port of the source.  | long  |
| <a name="connection.source.host.mac"></a>`connection.source.host.mac`  | MAC address of the source.  | keyword  |
| <a name="connection.source.host.domain"></a>`connection.source.host.domain`  | Source domain.  | keyword  |
| <a name="connection.source.host.subdomain"></a>`connection.source.host.subdomain`  | Source subdomain.  | keyword  |
| <a name="connection.direction"></a>`connection.direction`  | Direction of the network traffic.<br/>Recommended values are:<br/>  * inbound<br/>  * outbound<br/>  * unknown  | keyword  |
| <a name="connection.forwarded_ip"></a>`connection.forwarded_ip`  | Host IP address when the source IP address is the proxy.  | ip  |

I opened a PR to discuss this instead of an issue as it will allow us to discuss the high level parts as comment but also details directly in the code.
@jordansissel
Copy link

+1 to renaming geoip to location. This would allow folks to place any where-style data in this set, such as: region, cloud platform, datacenter, rack, building number, address, zip code, etc.

Not all where have "geo" or "ip" concepts, so I think location matches much more nicely.

@ave19
Copy link

ave19 commented Jul 19, 2018

i like it!

@ave19
Copy link

ave19 commented Jul 19, 2018

Wait a minute... how bad is it that the name location is also mentioned by the geo point type? It seems that the name location is not required by geo point, but it's in the docs that way. Would we end up with source.location.location[.lat,.lon]?

@ruflin
Copy link
Member Author

ruflin commented Jul 19, 2018

What about location.geo.lat / .lon? Longitude and latitude are geo information I would say. We definitively need to rename it. Suggestions welcome.

@jordansissel
Copy link

location.geo.lat

+1 for geo

@ruflin
Copy link
Member Author

ruflin commented Jul 20, 2018

An other interesting twist I just found out is that in Elasticsearch the geo information can be either passed as lat and lon in two fields, as a string or as an array: https://www.elastic.co/guide/en/elasticsearch/reference/current/geo-point.html

Not sure how we should deal with it but we could have also geo.hash for the geohash, geo.point for the point array. geohas could also be used for the string option.

@willemdh
Copy link
Contributor

willemdh commented Jul 26, 2018

Personally I'm a fan of short fieldnames. Longer fieldnames tend to mess up datatables etc. So my preference goes to geo above location.

About https://www.elastic.co/guide/en/elasticsearch/reference/current/geo-point.html

I'm not a gis expert, but the mapping in the example says:

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "location": {
          "type": "geo_point"
        }
      }
    }
  }
}

So couldn't we just do something like:

"geo": {
  "properties": {
    "city_name": {
      "ignore_above": 1024,
      "type": "keyword"
    },
    "continent_name": {
      "ignore_above": 1024,
      "type": "keyword"
    },
    "country_iso_code": {
      "ignore_above": 1024,
      "type": "keyword"
    },
    "location": {
      "type": "geo_point"
    },
    "region_name": {
      "ignore_above": 1024,
      "type": "keyword"
    }
  }
}

which could cover all 5 geo point expressions?

@ruflin
Copy link
Member Author

ruflin commented Jul 26, 2018

So you are basically proposing geo.location instead of location.geo.

Good point about the location field that we can just set it to geo_point and have ES worry about what it will contain.

@willemdh
Copy link
Contributor

willemdh commented Jul 26, 2018

So you are basically proposing geo.location instead of location.geo

Something like that yes. Another possibility could be:

"geo": {
  "properties": {
    "city_name": {
      "ignore_above": 1024,
      "type": "keyword"
    },
    "continent_name": {
      "ignore_above": 1024,
      "type": "keyword"
    },
    "country_iso_code": {
      "ignore_above": 1024,
      "type": "keyword"
    },
    "point": {
      "type": "geo_point"
    },
    "region_name": {
      "ignore_above": 1024,
      "type": "keyword"
    }
  }
}

@ruflin These are all very high level discussions with multiple possible solutions. There should be a vote of some kind with an uneven number of people for decisions which could have a breaking impact. I'm quite sure I'm not seeing all possible implications of choosing location vs geo as root object. I just have a personal feeling towards keeping field names asap (as short as possible ;) ).

Someone with real GIS experience should jump in on this one and share his (or her) opinion.

@ruflin
Copy link
Member Author

ruflin commented Jul 27, 2018

I think more important then keeping file names short is to keep them descriptive and self explanatory. Shortening names can still be a UI feature for example.

Good idea about pinging people with more GIS experience. I just did that internally.

@nickpeihl
Copy link
Member

My preference is geo and geo.location. "Geo" is a common designation in the GIS industry (Geospatial, GeoJSON, Geographic Information Systems). And, imo, it's a better specifier as a top level element than location since "location" may also be used in non-spatial contexts (URLs, files on disk).

@thomasneirynck
Copy link

thomasneirynck commented Jul 27, 2018

+1 for geo.*, for same reasons as @nickpeihl

@ave19
Copy link

ave19 commented Jul 27, 2018 via email

ruflin added a commit to ruflin/ecs that referenced this issue Jul 30, 2018
Renaming the `geoip` prefix to `geo` based on the discussion in elastic#50
@ruflin
Copy link
Member Author

ruflin commented Jul 30, 2018

I opened #58 based on the discussion about with geo.location.

ruflin added a commit to ruflin/ecs that referenced this issue Aug 2, 2018
Renaming the `geoip` prefix to `geo` based on the discussion in elastic#50
andrewkroh pushed a commit that referenced this issue Aug 3, 2018
* Rename geoip.* to geo.* fields

Renaming the `geoip` prefix to `geo` based on the discussion in #50

* fix IP and rename geoip.yml to geo.yml
@ruflin
Copy link
Member Author

ruflin commented Aug 8, 2018

Closing as #58 was merged.

@ruflin ruflin closed this as completed Aug 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants