Skip to content

Latest commit

 

History

History
536 lines (356 loc) · 34.8 KB

README.md

File metadata and controls

536 lines (356 loc) · 34.8 KB

WARNING: THIS IS WORK IN PROGRESS

Elastic Common Schema (ECS)

The Elastic Common Schema (ECS) is used to provide a common data model when ingesting data into Elasticsearch. Having a common schema allows you correlate data from sources like logs and metrics or IT operations analytics and security analytics.

ECS is still under development and backward compatibility is not guaranteed. Any feedback on the general structure, missing fields, or existing fields is appreciated. For contributions please read the Contributing Guide.

The current version of ECS is 0.1.0.

Fields

List of available ECS fields.

Base fields

The base set contains all fields which are on the top level without a namespace.

These are fields which are common across all types of events.

Field Description Type Multi Field Example
@timestamp Timestamp when the event was created.
For log events this is expected to be when the event was generated and not when it was read.
Timestamp is a required field and must exist in all events.
date 2016-05-23T08:05:34.853Z
tags Tags is a list of keywords which are used to tag each event. keyword ["production", "env2"]
labels Labels is an object which contains key/value pairs.
Labels can be used to add additional meta information to events. Label should not contain nested objects and all values are stored as keyword.
An example usage is the docker and k8s labels.
object {key1: value1, key2: value2}
message For log events the message field contains the log message.
In other use cases the message field can be used to concatenate together different values which are then freely searchable. Or if multiple messages exist they can be combined here into one message.
text Hello World

Agent fields

The agent fields contains the data about the agent/client/shipper that created the event.

As an example in case of Beats for logs the agent.name is filebeat. In the case of APM it is the agent running in the app / service. The agent information does not change if data is sent through queuing system like Kafka, Redis, or processing systems like Logstash or APM Server.

Field Description Type Multi Field Example
agent.version Agent version. keyword 6.0.0-rc2
agent.name Agent name.
Name of the agent.
keyword filebeat
agent.id Unique identifier of this agent if one exists.
In the case of Beats this would be beat.id.
keyword 8a4f500d
agent.ephemeral_id Ephemeral identifier of this agent if one exists.
This id compared to id normally changes across restarts.
keyword 8a4f500f

Cloud fields

All fields related to the cloud or infrastructure the events are coming from.

In case Metricbeat is running on an EC2 host and fetches data from its host, the cloud info is expected to contain the data about this machine. In the case Metricbeat runs outside the cloud on a remote machine and fetches data from a service running in the cloud it is expected to have the cloud data from the machine on which the service is running in.

Field Description Type Multi Field Example
cloud.provider Name of the cloud provider. Example values are ec2, gce, or digitalocean. keyword ec2
cloud.availability_zone Availability zone in which this host is running. keyword us-east-1c
cloud.region Region in which this host is running. keyword us-east-1
cloud.instance.id Instance ID of the host machine. keyword i-1234567890abcdef0
cloud.instance.name Instance name of the host machine. keyword
cloud.machine.type Machine type of the host machine. keyword t2.medium

Container fields

Container fields are used for meta information about the specific container the information is coming from. This should help to correlate data based containers from any runtime.

Field Description Type Multi Field Example
container.runtime Runtime managing this container. keyword docker
container.id Unique container id. keyword
container.image.name Name of the image the container was built on. keyword
container.image.tag Container image tag. keyword
container.name Container name. keyword
container.labels Image labels. object

Destination fields

Destination fields describe details about the destination of a packet/event.

Field Description Type Multi Field Example
destination.ip IP address of the destination.
This can be on or multiple IPv4 or IPv6 addresses.
ip
destination.hostname Hostname of the destination. keyword
destination.port Port of the destination. long
destination.mac MAC address of the destination. keyword
destination.domain Destination domain. keyword
destination.subdomain Destination subdomain. keyword

Device fields

Device fields are used to give additional information about the device that the information is coming from.

This could be a firewall, network device, etc.

Field Description Type Multi Field Example
device.mac MAC address of the device keyword
device.ip IP address of the device. ip
device.hostname Hostname of the device. keyword
device.vendor Device vendor information. text
device.version Device version. keyword
device.serial_number Device serial number. keyword
device.timezone.offset.sec Timezone offset of the host in seconds.
Number of seconds relative to UTC. In case the offset is -01:30 the value will be -5400.
long -5400
device.type The type of the device the data is coming from.
There is no predefined list of device types. Some examples are endpoint, firewall, ids, ips, proxy.
keyword firewall

Error fields

Error namespace

This can be used to represent all kinds of errors. It can be for errors that happen while fetching events or if the event itself contains an error.

Field Description Type Multi Field Example
error.id Unique identifier for the error. keyword
error.message Error message. text
error.code Error code describing the error. keyword

Event fields

The event fields are used for context information about the data itself.

Field Description Type Multi Field Example
event.id Unique ID to describe the event. keyword 8a4f500d
event.category Event category.
This can be a user defined category.
keyword metrics
event.type A type given to this kind of event which can be used for grouping.
This is normally defined by the user.
keyword nginx-stats-metrics
event.module Name of the module this data is coming from.
This information is coming from the modules used in Beats or Logstash.
keyword mysql
event.dataset Name of the dataset.
The concept of a dataset (fileset / metricset) is used in Beats as a subset of modules. It contains the information which is currently stored in metricset.name and metricset.module or fileset.name.
keyword stats
event.severity Severity describes the severity of the event. What the different severity values mean can very different between use cases. It's up to the implementer to make sure severities are consistent across events. long 7
event.raw Raw text message of entire event to be used to demonstrate log integrity. keyword Sep 19 08:26:10 host CEF:0|Security| threatmanager|1.0|100| worm successfully stopped|10|src=10.0.0.1 dst=2.1.2.2spt=1232
event.hash Hash (perhaps logstash fingerprint) of raw field to be able to demonstrate log integrity. keyword 123456789012345678901234567890ABCD
event.version The version field contains the version an event for ECS adheres to.
This field should be provided as part of each event to make it possible to detect to which ECS version an event belongs.
event.version is a required field and must exist in all events. It describes which ECS version the event adheres to.
The current version is 0.1.0.
keyword 0.1.0
event.duration Duration of the event in nanoseconds. long
event.created event.created contains the date when the event was created.
This timestamp is distinct from @timestamp in that @timestamp contains the processed timestamp. For logs these two timestamps can be different as the timestamp in the log line and when the event is read for example by Filebeat are not identical. @timestamp must contain the timestamp extracted from the log line, event.created when the log line is read. The same could apply to package capturing where @timestamp contains the timestamp extracted from the network package and event.created when the event was created.
In case the two timestamps are identical, @timestamp should be used.
date
event.risk_score Risk score value of the event. float

File fields

File attributes.

Field Description Type Multi Field Example
file.path The path to the file. text
file.path.raw The path to the file. This is a non-analyzed field that is useful for aggregations. keyword 1
file.target_path The target path for symlinks. text
file.target_path.raw The path to the file. This is a non-analyzed field that is useful for aggregations. keyword 1
file.extension The file extension.
This should allow easy filtering by file extensions.
keyword png
file.type The file type (file, dir, or symlink). keyword
file.device The device. keyword
file.inode The inode representing the file in the filesystem. keyword
file.uid The user ID (UID) or security identifier (SID) of the file owner. keyword
file.owner The file owner's username. keyword
file.gid The primary group ID (GID) of the file. keyword
file.group The primary group name of the file. keyword
file.mode The mode of the file in octal representation. keyword 416
file.size The file size in bytes (field is only added when type is file). long
file.mtime The last modified time of the file (time when content was modified). date
file.ctime The last change time of the file (time when metadata was changed). date

Geoip fields

Geoip fields are for used for geo information for an ip address.

The conversion to geoip information can be done by the Elasticsearch geoip plugin.

Field Description Type Multi Field Example
geoip.continent_name The name of the continent. keyword
geoip.country_iso_code Country ISO code. keyword
geoip.location The longitude and latitude. geo_point
geoip.region_name The region name. keyword
geoip.city_name The city name. keyword

Host fields

All fields related to a host. A host can be a physical machine, a virtual machine, and also a Docker container.

Normally the host information is related to the machine on which the event was generated / collected but also can be used differently if needed.

Field Description Type Multi Field Example
host.timezone.offset.sec Timezone offset of the host in seconds.
Number of seconds relative to UTC. In case the offset is -01:30 the value will be -5400.
long -5400
host.name host.name is the hostname of the host.
It can contain what hostname returns on Unix systems, the fully qualified domain name or also a name specified by the user. It is up to the sender to decide which value to use.
keyword
host.id Unique host id.
As hostname is not always unique, this often can be configured by the user. An example here is the current usage of beat.name.
keyword
host.ip Host ip address. ip
host.mac Host mac address. keyword
host.type This is the type of the host.
For Cloud providers this can be the machine type like t2.medium. Or it vm, container for example or something user defined.
keyword
host.os.platform Operating system platform (e.g. centos, ubuntu, windows). keyword darwin
host.os.name Operating system name. keyword Mac OS X
host.os.family OS family (e.g. redhat, debian, freebsd, windows). keyword debian
host.os.version Operating system version. keyword 10.12.6
host.architecture Operating system architecture. keyword x86_64

Kubernetes fields

Kubernetes fields are used for meta information about k8s. This should help to correlate data coming out of k8s setups.

Field Description Type Multi Field Example
kubernetes.pod.name Kubernetes pod name keyword
kubernetes.namespace Kubernetes namespace keyword
kubernetes.labels Kubernetes labels map object
kubernetes.annotations Kubernetes annotations map object
kubernetes.container.name Kubernetes container name. This name is unique within the pod only, it's different from underlying container name (container.name in ECS) keyword

Log fields

Fields which are specific to log events.

Field Description Type Multi Field Example
log.level Log level of the log event.
Some examples are WARN, ERR, INFO.
keyword ERR
log.line Line number the log event was collected from. long 18
log.offset Offset of the beginning of the log event. long 12

Network fields

All fields related to network data.

Field Description Type Multi Field Example
network.protocol Network protocol name. keyword http
network.direction Direction of the network traffic.
The recommended values are:
* inbound
* outbound
* unknown
keyword inbound
network.forwarded_ip forwarded_ip indicates the host IP address when the source IP address is the proxy. ip 192.1.1.2
network.inbound.bytes Network inbound bytes. long 184
network.inbound.packets Network inbound packets. long 12
network.outbound.bytes Network outbound bytes. long 184
network.outbound.packets Network outbound packets. long 12

Organization fields

The organization namespace can be used to enrich data with information from which organization the data belongs.

This can be useful if data should stored in the same index should be sometimes filtered or organized by one or multiple organizations.

Field Description Type Multi Field Example
organization.name Organization name. text
organization.id Unique identifier for the organization. keyword

Process fields

These fields contain information about a process.

If metrics information is collected for a process and a process id / name shows up in a log message, these fields should help to correlated the two. It is expected that the process.pid will often also stay in the metric itself and only copied to the global field for correlation.

Field Description Type Multi Field Example
process.args Process arguments.
May be filtered to protect sensitive information.
keyword ['-l', 'user', '10.0.0.16']
process.name Process name.
This is sometimes also known as program name or similar.
keyword ssh
process.pid Process id. long
process.ppid Process parent id. long
process.title Process title.
The proctitle, often the same as process name.
keyword

Service fields

The service fields describe the service for / from which the data was collected.

If logs or metrics are collected from Redis, service.name would be redis. This allows to find and correlate logs for a specific service and even version with service.version.

Field Description Type Multi Field Example
service.id Unique identifier of the running service.
This id should uniquely identify this service. This makes it possible to correlate logs and metrics for one specific service. For example in case of issues with one redis instance, it's possible to filter on the id to see metrics and logs for this single instance.
keyword d37e5ebfe0ae6c4972dbe9f0174a1637bb8247f6
service.name Name of the service data is collected from.
The name can be used to group logs and metrics together from one service and correlate them.
keyword elasticsearch
service.type Service type. keyword
service.state Current state of the service. keyword
service.version Version of the service the data was collected from.
This allows to look at a data set only for a specific version of a service.
keyword 3.2.4
service.ephemeral_id Ephemeral identifier of this service if one exists.
This id compared to id normally changes across restarts.
keyword 8a4f500f

Source fields

Source fields describe details about the source of where the event is coming from.

Field Description Type Multi Field Example
source.ip IP address of the source.
This can be on or multiple IPv4 or IPv6 addresses.
ip
source.hostname Hostname of the source. keyword
source.port Port of the source. long
source.mac MAC address of the source. keyword
source.domain Source domain. keyword
source.subdomain Source subdomain. keyword

URL fields

A complete URL, with scheme, host, and path.

The URL object can be reused in other prefixes like host.url.* for example. It is important that whenever URL is used that the same structure is used.

url.href is a multi field which means the data is stored as keyword url.href and test url.href.analyzed. The advantage of this is that for running a query against only a part of the url still works without having to split up the URL in all its part on ingest time.

Based on whatwg URL definition: whatwg/url#337

Field Description Type Multi Field Example
url.href href contains the full url. The field is stored as keyword.
href is an analyzed field so the parsed information can be accessed through href.analyzed in queries.
keyword https://elastic.co:443/search?q=elasticsearch#top
url.href.analyzed text 1
url.protocol The protocol of the request, e.g. "https:". keyword
url.hostname The hostname of the request, e.g. "example.com".
For correlation the this field can be copied into the host.name field.
keyword
url.port The port of the request, e.g. 443. keyword
url.pathname The path of the request, e.g. "/search". text
url.pathname.raw The url path. This is a non-analyzed field that is useful for aggregations. keyword 1
url.search The search describes the query string of the request, e.g. "q=elasticsearch". text
url.search.raw The url search part. This is a non-analyzed field that is useful for aggregations. keyword 1
url.hash The hash of the request URL, e.g. "top". keyword
url.username The username of the request. keyword
url.password The password of the request. keyword
url.extension The url extension field contains the extension of the file associated with the url.
A simple example is http://localhost/logo.png where the extension would be png. There can also be more complex cases like http://localhost/content?asset=logo.png&token=XYZ where the extension could also be png but depends on the implementation.
The extension field should be left out if the extension is not defined.
keyword png

User fields

The user fields are used to describe user information as part of the event.

All fields in user can have one or multiple entries. If a user has more then one id, an array with the ids must be provided.

Field Description Type Multi Field Example
user.id One or multiple unique identifiers of the user. keyword
user.name Name of the user.
As the field is a keyword, the field will not be tokenized.
keyword
user.email User email address. keyword
user.hash Unique user hash to correlate information for a user in anonymized form.
This is useful in case user.id or user.name cannot be used because it contains confidential information.
keyword

User agent fields

The user_agent fields are normally coming from a browser request.

These are common to show up in web service logs coming from the parsed user agent string.

Field Description Type Multi Field Example
user_agent.raw Unparsed version of the user_agent. text
user_agent.device The name of the physical device. keyword
user_agent.version Version of the physical device. keyword
user_agent.major The major version of the user agent. long
user_agent.minor The minor version of the user agent. long
user_agent.patch The patch version of the user agent. keyword
user_agent.name The name of the user agent. keyword Chrome
user_agent.os.name The name of the operating system. keyword
user_agent.os.version Version of the operating system. keyword
user_agent.os.major The major version of the operating system. long
user_agent.os.minor The minor version of the operating system. long
user_agent.os.name The name of the operating system. keyword

Use cases

Below are some examples that demonstrate how ECS fields can be applied to specific use cases.

Implementing ECS

Adhere to ECS

The following rules apply if an event wants to adhere to ECS

  • The document MUST have the @timestamp field.
  • The data type defined for an ECS field MUST be used.
  • It SHOULD have the field event.version to define which version of ECS it uses.

To make the most out of ECS as many fields as possible should be mapped to ECS.

Rules

ECS follows the following writing and naming rules for the fields. The goal of these rules is to make the fields easy to remember and have a guide when new fields are added.

Often events will contain additional fields besides ECS. These can follow the the same naming and writing rules but don't have to.

Writing

  • All fields must be lower case
  • No special characters except _
  • Words are combined through underscore

Naming

  • Use present tense unless field describes historical information.
  • Use singular and plural names properly to reflect the field content. For example, use requests_per_sec rather than request_per_sec.
  • Organise the prefixes from general to specific to allow grouping fields into objects with a prefix like host.*.
  • Avoid stuttering of words. If part of the field name is already in the prefix, do not repeat it. Example: host.host_ip should be host.ip.
  • Fields must be prefixed except for the base fields. For example all host fields are prefixed with host.. See dot notation in FAQ for more details.
  • Do not use abbreviations (few exceptions like ip exist)

About ECS

Scope

The Elastic Common Schema defines a common set of document fields (and their respective field names) to be used in event messages stored in Elasticsearch as part of any logging or metrics use case of the Elastic Stack, including IT operations analytics and security analytics.

Goals

The ECS has the following goals:

  • Correlate data between metrics, logs and APM
  • Correlate data coming from the same machines / hosts
  • Correlate data coming from the same service

Priority on which fields are added is based on these goals.

Benefits

The benefits to a user adopting these fields and names in their clusters are:

  • Ability to simply correlate data from different data sources
  • Improved ability to remember commonly used field names (since there is only a single set, not a set per data source)
  • Improved ability to deduce unremembered field names (since the field naming follows a small number of rules with few exceptions)
  • Ability to re-use analysis content (searches, visualizations, dashboards, alerts, reports, and ML jobs) across multiple data sources
  • Ability to use any future Elastic-provided analysis content in their environment without modifications

FAQ

Why is ECS using a dot nation instead of an underline notation?

There are two common formats on how keys are formatted when ingesting data into Elasticsearch:

  • Dot notation: user.firstname: Nicolas, user.lastname: Ruflin
  • Underline notation: user_firstname: Nicolas, user_lastname: Ruflin

In ECS the decision was made to use the dot notation and this entry is intended to share some background on this decision.

What is the difference between the two notations?

When ingesting user.firstname and user.lastname it is identical to ingesting the following JSON:

"user": {
  "firstname": "Nicolas",
  "lastname": "Ruflin"
}

This means internally in Elasticsearch user is represented as an object datatype. In the case of the underline notation both are just string datatypes.

NOTE: ECS does not used nested datatypes which is an array of objects.

Advantages of dot notation

The advantage of the dot notation is that on the Elasticsearch side each prefix is an object. Each object can have parameters on how fields inside the object should be treated, for example if they should be index or mappings should be extended. In the context of ECS this allows for example to disable dynamic property creation for certain prefixes.

On the ingest side of Elasticsearch it makes it simpler to for example drop complete objects with the remove processor instead of selecting each key inside it. It does not require prior knowledge which keys will end up in the object.

On the event producing side like in Beats it simplifies the creation of the events as on the code side each object can be treated as an object (or struct in Golang as an example) which makes constructing and modifying each part of the final event easier.

Disadvantage of dot notation

In Elasticsearch each key can only have one type. So if user is an object it's not possible to have in the same index user as type keyword like {"user": "nicolas ruflin"}. This can be an issue in certain datasets.

For the ECS data itself this is not an issue as all fields are predefined.

What if I already use the underline notation?

It's not a problem to mix the underline notation with the ECS do notation. They can coexist in the same document as long as there are not conflicts.

I have conflicting fields with ECS?

Assuming you already have a field user but ECS uses user as an object, you can use the rename processor on ingest time to rename your field to either the matching ECS field or rename it to user.value instead if your field does not match ECS.