Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove src/dst.hostname, rename url.hostname to url.domain. #175

Merged
merged 7 commits into from
Nov 7, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@ All notable changes to this project will be documented in this file based on the
* Rename `event.version` to `ecs.version`. #169
* Remove the `http` field set temporarily. #171
* Remove the `user_agent` field set temporarily. #172
* Rename `url.hostname` to `url.domain`. #175
* Remove `source.hostname` and `destination.hostname`. #175

### Bugfixes

Expand Down
10 changes: 4 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,6 @@ Destination fields describe details about the destination of a packet/event.
| Field | Description | Level | Type | Example |
|---|---|---|---|---|
| <a name="destination.ip"></a>destination.ip | IP address of the destination.<br/>Can be one or multiple IPv4 or IPv6 addresses. | core | ip | |
| <a name="destination.hostname"></a>destination.hostname | Hostname of the destination. | core | keyword | |
| <a name="destination.port"></a>destination.port | Port of the destination. | core | long | |
| <a name="destination.mac"></a>destination.mac | MAC address of the destination. | core | keyword | |
| <a name="destination.domain"></a>destination.domain | Destination domain. | core | keyword | |
Expand Down Expand Up @@ -347,28 +346,27 @@ The service fields describe the service for or from which the data was collected

## <a name="source"></a> Source fields

Source fields describe details about the source of the event.
Source fields describe details about the destination of a packet/event.


| Field | Description | Level | Type | Example |
|---|---|---|---|---|
| <a name="source.ip"></a>source.ip | IP address of the source.<br/>Can be one or multiple IPv4 or IPv6 addresses. | core | ip | |
| <a name="source.hostname"></a>source.hostname | Hostname of the source. | core | keyword | |
| <a name="source.port"></a>source.port | Port of the source. | core | long | |
| <a name="source.mac"></a>source.mac | MAC address of the source. | core | keyword | |
| <a name="source.domain"></a>source.domain | Source domain. | core | keyword | |


## <a name="url"></a> URL fields

URL fields provide a complete URL, with scheme, host, and path. The URL object can be reused in other prefixes, such as `host.url.*` for example. Keep the structure consistent whenever you use URL fields.
URL fields provide a complete URL, with scheme, host, and path.


| Field | Description | Level | Type | Example |
|---|---|---|---|---|
| <a name="url.original"></a>url.original | Full original url. The field is stored as keyword. | extended | keyword | `https://elastic.co:443/search?q=elasticsearch#top` |
| <a name="url.original"></a>url.original | Full original url. The field is stored as keyword. | extended | keyword | `https://www.elastic.co:443/search?q=elasticsearch#top` |
| <a name="url.scheme"></a>url.scheme | Scheme of the request, such as "https".<br/>Note: The `:` is not part of the scheme. | extended | keyword | `https` |
| <a name="url.hostname"></a>url.hostname | Hostname of the request, such as "elastic.co".<br/>In some cases a URL may refer to an IP and/or port directly, without a domain name. In this case, the IP address would go to the `hostname` field. | extended | keyword | `elastic.co` |
| <a name="url.domain"></a>url.domain | Domain of the request, such as "www.elastic.co".<br/>In some cases a URL may refer to an IP and/or port directly, without a domain name. In this case, the IP address would go to the `domain` field. | extended | keyword | `www.elastic.co` |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still the case? should IP addresses be stored in the domain field?
I think that having sources that don't differentiate names from IPs is quite frequent when parsing logs, having IPs stored in a field called domain can be a bit confusing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsoriano the url.domain field should generally not be populated with an IP address, however, when an event or log indicates that an IP address was used in a URL as the host/hostname, such as in http://15.73.192.108/index.html, then the string 15.73.192.108 would be entered into the url.domain field. This is useful, for example, for a security analyst to know that a user or system is attempting to bypass the DNS infrastructure to access a web resource. In some cases, this could be an indicator of malware activity. So searching through all values of url.domain and looking for values that have the form of an IP address can be a useful analysis.

Copy link
Contributor Author

@webmat webmat Nov 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsoriano We were in a tough spot and couldn't actually call it host, as this concept of "name or IP" is generally called. In ECS, host is a field set that can be reused in multiple places.

Just to be clear, the concept of "hostname" is often misused in exactly the same way. The definition of a "host" is "IP address or hostname" (check out the RFC survey I summarized in #166).

An IP address doesn't belong in a "hostname" either. Using "hostname" this way is a mistake that's more common, though, so it doesn't stick out quite as much when people see it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanations, I see the problematics and the reasoning, but using domain to store IPs still sounds a bit weird to me. In any case I don't have a better proposal if we cannot use host, but I'm afraid this can be a bit confusing also for users.
Maybe after moving some beat modules to ECS and playing a bit more with it I see it in a different way :)

Will it be the same for source and destination? Maybe we should add some explanations about this also there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsoriano Under source and destination, we actually have both domain and ip. Having started one transition for the Traefik module here elastic/beats#9005 outlines a difference in approach.

Most of the modules dealt with this indirectly and named a keyword field something like remote_ip and attempted to do GeoIP on it. In other words, the duality of this value being either an IP or a hostname was not addressed. Hopefully IN's GeoIP just fails gracefully when it encounters a hostname (I haven't checked). So I think in most of the modules right now, the lie is the other way around ;-) The remote_ip field can contain either an IP or a hostname, the field is keyword to support this, which ultimately means you can't do CIDR searches on it & so on.

The approach I took in the PR linked above is to add a tiny bit of logic to the module in this conversion. The ambiguous value is stored in a custom field, then via Grok I try to grab an IP out of the field, and store that in source.ip. If that grok fails, the fallback is to grab whatever's there and store it in source.domain. This happens here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, but having hosts in two fields can be a source of problems:

  • Queries on aggregations of both fields will be more complicated, and they can be a common use case (for example to do a visualization that shows top N hosts, including ips and domains).
  • Every module is obligated to implement a logic like this one and use more complicated queries to comply with ECS, even if the author doesn't care so much if a host is an IP or a domain. Or they can use a custom field, but I think this is going to be a common use case, so it'd be nice to be a field covered by ECS.

We could have a field for "ambiguous" hosts in addition to domain and ip, so all cases can be covered. Modules that try to parse the host as domain or ip could still keep the original in the ambiguous field, modules that don't care can use the less-featured ambiguous keyword field.
Even if ambiguous and not fully compliant with standards this field can still be common and useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree that having a field for the ambiguous information would be useful.

It used to be hostname, which was an incorrect name, as demonstrated in #166 (see the RFC survey in the issue body).

And we're back to square 1: the "correct" name according to the RFCs is "host" which is defined as either an IP or a hostname. And we don't want to use "host" because it will conflict with the current host field set, if people want to enrich their stream by nesting known host information there.

My best name for the ambiguous field for now is host_address at this time. But we haven't gotten together yet to plan the next steps towards ECS Beta 2.

Note also that the simplest way to avoid this problem now is to store the ambiguous "host" in a custom field. To get back to my Traefik module example, I'm leaving the incorrectly named traefik.access.remote_ip field there, with the exact same meaning as before.

| <a name="url.port"></a>url.port | Port of the request, such as 443. | extended | integer | `443` |
| <a name="url.path"></a>url.path | Path of the request, such as "/search". | extended | keyword | |
| <a name="url.query"></a>url.query | The query field describes the query string of the request, such as "q=elasticsearch".<br/>The `?` is excluded from the query string. If a URL contains no `?`, there is no query field. If there is a `?` but no query, the query field exists with an empty string. The `exists` query can be used to differentiate between the two cases. | extended | keyword | |
Expand Down
29 changes: 8 additions & 21 deletions fields.yml
Original file line number Diff line number Diff line change
Expand Up @@ -252,12 +252,6 @@

Can be one or multiple IPv4 or IPv6 addresses.

- name: hostname
level: core
type: keyword
description: >
Hostname of the destination.

- name: port
level: core
type: long
Expand Down Expand Up @@ -1069,7 +1063,8 @@
title: Source
group: 2
description: >
Source fields describe details about the source of the event.
Source fields describe details about the destination of a
packet/event.
type: group
fields:

Expand All @@ -1081,12 +1076,6 @@

Can be one or multiple IPv4 or IPv6 addresses.

- name: hostname
level: core
type: keyword
description: >
Hostname of the source.

- name: port
level: core
type: long
Expand All @@ -1108,9 +1097,7 @@
- name: url
title: URL
description: >
URL fields provide a complete URL, with scheme, host, and path. The URL
object can be reused in other prefixes, such as `host.url.*` for
example. Keep the structure consistent whenever you use URL fields.
URL fields provide a complete URL, with scheme, host, and path.
type: group
fields:

Expand All @@ -1119,7 +1106,7 @@
type: keyword
description: >
Full original url. The field is stored as keyword.
example: https://elastic.co:443/search?q=elasticsearch#top
example: https://www.elastic.co:443/search?q=elasticsearch#top

- name: scheme
level: extended
Expand All @@ -1130,15 +1117,15 @@
Note: The `:` is not part of the scheme.
example: https

- name: hostname
- name: domain
level: extended
type: keyword
description: >
Hostname of the request, such as "elastic.co".
Domain of the request, such as "www.elastic.co".

In some cases a URL may refer to an IP and/or port directly, without a
domain name. In this case, the IP address would go to the `hostname` field.
example: elastic.co
domain name. In this case, the IP address would go to the `domain` field.
example: www.elastic.co

- name: port
level: extended
Expand Down
6 changes: 2 additions & 4 deletions schema.csv
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@ container.labels,object,extended,
container.name,keyword,extended,
container.runtime,keyword,extended,docker
destination.domain,keyword,core,
destination.hostname,keyword,core,
destination.ip,ip,core,
destination.mac,keyword,core,
destination.port,long,core,
Expand Down Expand Up @@ -110,13 +109,12 @@ service.state,keyword,core,
service.type,keyword,core,elasticsearch
service.version,keyword,core,3.2.4
source.domain,keyword,core,
source.hostname,keyword,core,
source.ip,ip,core,
source.mac,keyword,core,
source.port,long,core,
url.domain,keyword,extended,www.elastic.co
url.fragment,keyword,extended,
url.hostname,keyword,extended,elastic.co
url.original,keyword,extended,https://elastic.co:443/search?q=elasticsearch#top
url.original,keyword,extended,https://www.elastic.co:443/search?q=elasticsearch#top
url.password,keyword,extended,
url.path,keyword,extended,
url.port,integer,extended,443
Expand Down
6 changes: 0 additions & 6 deletions schemas/destination.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,6 @@

Can be one or multiple IPv4 or IPv6 addresses.

- name: hostname
level: core
type: keyword
description: >
Hostname of the destination.

- name: port
level: core
type: long
Expand Down
9 changes: 2 additions & 7 deletions schemas/source.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@
title: Source
group: 2
description: >
Source fields describe details about the source of the event.
Source fields describe details about the destination of a
packet/event.
type: group
fields:

Expand All @@ -15,12 +16,6 @@

Can be one or multiple IPv4 or IPv6 addresses.

- name: hostname
level: core
type: keyword
description: >
Hostname of the source.

- name: port
level: core
type: long
Expand Down
14 changes: 6 additions & 8 deletions schemas/url.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,7 @@
- name: url
title: URL
description: >
URL fields provide a complete URL, with scheme, host, and path. The URL
object can be reused in other prefixes, such as `host.url.*` for
example. Keep the structure consistent whenever you use URL fields.
URL fields provide a complete URL, with scheme, host, and path.
type: group
fields:

Expand All @@ -13,7 +11,7 @@
type: keyword
description: >
Full original url. The field is stored as keyword.
example: https://elastic.co:443/search?q=elasticsearch#top
example: https://www.elastic.co:443/search?q=elasticsearch#top

- name: scheme
level: extended
Expand All @@ -24,15 +22,15 @@
Note: The `:` is not part of the scheme.
example: https

- name: hostname
- name: domain
level: extended
type: keyword
description: >
Hostname of the request, such as "elastic.co".
Domain of the request, such as "www.elastic.co".

In some cases a URL may refer to an IP and/or port directly, without a
domain name. In this case, the IP address would go to the `hostname` field.
example: elastic.co
domain name. In this case, the IP address would go to the `domain` field.
example: www.elastic.co

- name: port
level: extended
Expand Down
12 changes: 2 additions & 10 deletions template.json
Original file line number Diff line number Diff line change
Expand Up @@ -128,10 +128,6 @@
"ignore_above": 1024,
"type": "keyword"
},
"hostname": {
"ignore_above": 1024,
"type": "keyword"
},
"ip": {
"type": "ip"
},
Expand Down Expand Up @@ -537,10 +533,6 @@
"ignore_above": 1024,
"type": "keyword"
},
"hostname": {
"ignore_above": 1024,
"type": "keyword"
},
"ip": {
"type": "ip"
},
Expand All @@ -559,11 +551,11 @@
},
"url": {
"properties": {
"fragment": {
"domain": {
"ignore_above": 1024,
"type": "keyword"
},
"hostname": {
"fragment": {
"ignore_above": 1024,
"type": "keyword"
},
Expand Down