Elastic Common Schema (ECS)

WARNING: THIS IS WORK IN PROGRESS

Elastic Common Schema (ECS)

The Elastic Common Schema (ECS) is used to provide a common data model when ingesting data into Elasticsearch. Having a common schema allows you correlate data from sources like logs and metrics or IT operations analytics and security analytics.

ECS is still under development and backward compatibility is not guaranteed. Any feedback on the general structure, missing fields, or existing fields is appreciated. For contributions please read the Contributing Guide.

The current version of ECS is 0.1.0.

Fields
Use cases
Implementing ECS
About ECS

Fields

List of available ECS fields.

Base fields
Agent fields
Cloud fields
Container fields
Destination fields
Device fields
Error fields
Event fields
File fields
Geoip fields
Host fields
Kubernetes fields
Log fields
Network fields
Organization fields
Process fields
Service fields
Source fields
URL fields
User fields
User agent fields

Base fields

The base set contains all fields which are on the top level without a namespace.

These are fields which are common across all types of events.

Field	Description	Type	Example
`@timestamp`	Timestamp when the event was created. For log events this is expected to be when the event was generated and not when it was read. Timestamp is a required field and must exist in all events.	date	`2016-05-23T08:05:34.853Z`
`tags`	Tags is a list of keywords which are used to tag each event.	keyword	`["production", "env2"]`
`labels`	Labels is an object which contains key/value pairs. Labels can be used to add additional meta information to events. Label should not contain nested objects and all values are stored as keyword. An example usage is the docker and k8s labels.	object	`{key1: value1, key2: value2}`
`message`	For log events the message field contains the log message. In other use cases the message field can be used to concatenate together different values which are then freely searchable. Or if multiple messages exist they can be combined here into one message.	text	`Hello World`

Agent fields

The agent fields contains the data about the agent/client/shipper that created the event.

As an example in case of Beats for logs the agent.name is filebeat. In the case of APM it is the agent running in the app / service. The agent information does not change if data is sent through queuing system like Kafka, Redis, or processing systems like Logstash or APM Server.

Field	Description	Type	Example
`agent.version`	Agent version.	keyword	`6.0.0-rc2`
`agent.name`	Agent name. Name of the agent.	keyword	`filebeat`
`agent.id`	Unique identifier of this agent if one exists. In the case of Beats this would be beat.id.	keyword	`8a4f500d`
`agent.ephemeral_id`	Ephemeral identifier of this agent if one exists. This id compared to id normally changes across restarts.	keyword	`8a4f500f`

Cloud fields

All fields related to the cloud or infrastructure the events are coming from.

In case Metricbeat is running on an EC2 host and fetches data from its host, the cloud info is expected to contain the data about this machine. In the case Metricbeat runs outside the cloud on a remote machine and fetches data from a service running in the cloud it is expected to have the cloud data from the machine on which the service is running in.

Field	Description	Type	Example
`cloud.provider`	Name of the cloud provider. Example values are ec2, gce, or digitalocean.	keyword	`ec2`
`cloud.availability_zone`	Availability zone in which this host is running.	keyword	`us-east-1c`
`cloud.region`	Region in which this host is running.	keyword	`us-east-1`
`cloud.instance.id`	Instance ID of the host machine.	keyword	`i-1234567890abcdef0`
`cloud.instance.name`	Instance name of the host machine.	keyword
`cloud.machine.type`	Machine type of the host machine.	keyword	`t2.medium`

Container fields

Container fields are used for meta information about the specific container the information is coming from. This should help to correlate data based containers from any runtime.

Field	Description	Type	Example
`container.runtime`	Runtime managing this container.	keyword	`docker`
`container.id`	Unique container id.	keyword
`container.image.name`	Name of the image the container was built on.	keyword
`container.image.tag`	Container image tag.	keyword
`container.name`	Container name.	keyword
`container.labels`	Image labels.	object

Destination fields

Destination fields describe details about the destination of a packet/event.

Field	Description	Type
`destination.ip`	IP address of the destination. This can be on or multiple IPv4 or IPv6 addresses.	ip
`destination.hostname`	Hostname of the destination.	keyword
`destination.port`	Port of the destination.	long
`destination.mac`	MAC address of the destination.	keyword
`destination.domain`	Destination domain.	keyword
`destination.subdomain`	Destination subdomain.	keyword

Device fields

Device fields are used to give additional information about the device that the information is coming from.

This could be a firewall, network device, etc.

Field	Description	Type	Example
`device.mac`	MAC address of the device	keyword
`device.ip`	IP address of the device.	ip
`device.hostname`	Hostname of the device.	keyword
`device.vendor`	Device vendor information.	text
`device.version`	Device version.	keyword
`device.serial_number`	Device serial number.	keyword
`device.timezone.offset.sec`	Timezone offset of the host in seconds. Number of seconds relative to UTC. In case the offset is -01:30 the value will be -5400.	long	`-5400`
`device.type`	The type of the device the data is coming from. There is no predefined list of device types. Some examples are `endpoint`, `firewall`, `ids`, `ips`, `proxy`.	keyword	`firewall`

Error fields

Error namespace

This can be used to represent all kinds of errors. It can be for errors that happen while fetching events or if the event itself contains an error.

Field	Description	Type
`error.id`	Unique identifier for the error.	keyword
`error.message`	Error message.	text
`error.code`	Error code describing the error.	keyword

Event fields

The event fields are used for context information about the data itself.

Field	Description	Type	Example
`event.id`	Unique ID to describe the event.	keyword	`8a4f500d`
`event.category`	Event category. This can be a user defined category.	keyword	`metrics`
`event.type`	A type given to this kind of event which can be used for grouping. This is normally defined by the user.	keyword	`nginx-stats-metrics`
`event.module`	Name of the module this data is coming from. This information is coming from the modules used in Beats or Logstash.	keyword	`mysql`
`event.dataset`	Name of the dataset. The concept of a `dataset` (fileset / metricset) is used in Beats as a subset of modules. It contains the information which is currently stored in metricset.name and metricset.module or fileset.name.	keyword	`stats`
`event.severity`	Severity describes the severity of the event. What the different severity values mean can very different between use cases. It's up to the implementer to make sure severities are consistent across events.	long	`7`
`event.raw`	Raw text message of entire event to be used to demonstrate log integrity.	keyword	`Sep 19 08:26:10 host CEF:0\|Security\| threatmanager\|1.0\|100\| worm successfully stopped\|10\|src=10.0.0.1 dst=2.1.2.2spt=1232`
`event.hash`	Hash (perhaps logstash fingerprint) of raw field to be able to demonstrate log integrity.	keyword	`123456789012345678901234567890ABCD`
`event.version`	The version field contains the version an event for ECS adheres to. This field should be provided as part of each event to make it possible to detect to which ECS version an event belongs. event.version is a required field and must exist in all events. It describes which ECS version the event adheres to. The current version is 0.1.0.	keyword	`0.1.0`
`event.duration`	Duration of the event in nanoseconds.	long
`event.created`	event.created contains the date when the event was created. This timestamp is distinct from @timestamp in that @timestamp contains the processed timestamp. For logs these two timestamps can be different as the timestamp in the log line and when the event is read for example by Filebeat are not identical. `@timestamp` must contain the timestamp extracted from the log line, event.created when the log line is read. The same could apply to package capturing where @timestamp contains the timestamp extracted from the network package and event.created when the event was created. In case the two timestamps are identical, @timestamp should be used.	date
`event.risk_score`	Risk score value of the event.	float

File fields

File attributes.

Field	Description	Type	Multi Field	Example
`file.path`	The path to the file.	text
`file.path.raw`	The path to the file. This is a non-analyzed field that is useful for aggregations.	keyword	1
`file.target_path`	The target path for symlinks.	text
`file.target_path.raw`	The path to the file. This is a non-analyzed field that is useful for aggregations.	keyword	1
`file.extension`	The file extension. This should allow easy filtering by file extensions.	keyword		`png`
`file.type`	The file type (file, dir, or symlink).	keyword
`file.device`	The device.	keyword
`file.inode`	The inode representing the file in the filesystem.	keyword
`file.uid`	The user ID (UID) or security identifier (SID) of the file owner.	keyword
`file.owner`	The file owner's username.	keyword
`file.gid`	The primary group ID (GID) of the file.	keyword
`file.group`	The primary group name of the file.	keyword
`file.mode`	The mode of the file in octal representation.	keyword		`416`
`file.size`	The file size in bytes (field is only added when `type` is `file`).	long
`file.mtime`	The last modified time of the file (time when content was modified).	date
`file.ctime`	The last change time of the file (time when metadata was changed).	date

Geoip fields

Geoip fields are for used for geo information for an ip address.

The conversion to geoip information can be done by the Elasticsearch geoip plugin.

Field	Description	Type
`geoip.continent_name`	The name of the continent.	keyword
`geoip.country_iso_code`	Country ISO code.	keyword
`geoip.location`	The longitude and latitude.	geo_point
`geoip.region_name`	The region name.	keyword
`geoip.city_name`	The city name.	keyword

Host fields

All fields related to a host. A host can be a physical machine, a virtual machine, and also a Docker container.

Normally the host information is related to the machine on which the event was generated / collected but also can be used differently if needed.

Field	Description	Type	Example
`host.timezone.offset.sec`	Timezone offset of the host in seconds. Number of seconds relative to UTC. In case the offset is -01:30 the value will be -5400.	long	`-5400`
`host.name`	host.name is the hostname of the host. It can contain what `hostname` returns on Unix systems, the fully qualified domain name or also a name specified by the user. It is up to the sender to decide which value to use.	keyword
`host.id`	Unique host id. As hostname is not always unique, this often can be configured by the user. An example here is the current usage of `beat.name`.	keyword
`host.ip`	Host ip address.	ip
`host.mac`	Host mac address.	keyword
`host.type`	This is the type of the host. For Cloud providers this can be the machine type like `t2.medium`. Or it vm, container for example or something user defined.	keyword
`host.os.platform`	Operating system platform (e.g. centos, ubuntu, windows).	keyword	`darwin`
`host.os.name`	Operating system name.	keyword	`Mac OS X`
`host.os.family`	OS family (e.g. redhat, debian, freebsd, windows).	keyword	`debian`
`host.os.version`	Operating system version.	keyword	`10.12.6`
`host.architecture`	Operating system architecture.	keyword	`x86_64`

Kubernetes fields

Kubernetes fields are used for meta information about k8s. This should help to correlate data coming out of k8s setups.

Field	Description	Type
`kubernetes.pod.name`	Kubernetes pod name	keyword
`kubernetes.namespace`	Kubernetes namespace	keyword
`kubernetes.labels`	Kubernetes labels map	object
`kubernetes.annotations`	Kubernetes annotations map	object
`kubernetes.container.name`	Kubernetes container name. This name is unique within the pod only, it's different from underlying container name (container.name in ECS)	keyword

Log fields

Fields which are specific to log events.

Field	Description	Type	Example
`log.level`	Log level of the log event. Some examples are `WARN`, `ERR`, `INFO`.	keyword	`ERR`
`log.line`	Line number the log event was collected from.	long	`18`
`log.offset`	Offset of the beginning of the log event.	long	`12`

Network fields

All fields related to network data.

Field	Description	Type	Example
`network.protocol`	Network protocol name.	keyword	`http`
`network.direction`	Direction of the network traffic. The recommended values are: * inbound * outbound * unknown	keyword	`inbound`
`network.forwarded_ip`	forwarded_ip indicates the host IP address when the source IP address is the proxy.	ip	`192.1.1.2`
`network.inbound.bytes`	Network inbound bytes.	long	`184`
`network.inbound.packets`	Network inbound packets.	long	`12`
`network.outbound.bytes`	Network outbound bytes.	long	`184`
`network.outbound.packets`	Network outbound packets.	long	`12`

Organization fields

The organization namespace can be used to enrich data with information from which organization the data belongs.

This can be useful if data should stored in the same index should be sometimes filtered or organized by one or multiple organizations.

Field	Description	Type	Multi Field	Example
`organization.name`	Organization name.	text
`organization.id`	Unique identifier for the organization.	keyword

Process fields

These fields contain information about a process.

If metrics information is collected for a process and a process id / name shows up in a log message, these fields should help to correlated the two. It is expected that the process.pid will often also stay in the metric itself and only copied to the global field for correlation.

Field	Description	Type	Example
`process.args`	Process arguments. May be filtered to protect sensitive information.	keyword	`['-l', 'user', '10.0.0.16']`
`process.name`	Process name. This is sometimes also known as program name or similar.	keyword	`ssh`
`process.pid`	Process id.	long
`process.ppid`	Process parent id.	long
`process.title`	Process title. The proctitle, often the same as process name.	keyword

Service fields

The service fields describe the service for / from which the data was collected.

If logs or metrics are collected from Redis, service.name would be redis. This allows to find and correlate logs for a specific service and even version with service.version.

Field	Description	Type	Example
`service.id`	Unique identifier of the running service. This id should uniquely identify this service. This makes it possible to correlate logs and metrics for one specific service. For example in case of issues with one redis instance, it's possible to filter on the id to see metrics and logs for this single instance.	keyword	`d37e5ebfe0ae6c4972dbe9f0174a1637bb8247f6`
`service.name`	Name of the service data is collected from. The name can be used to group logs and metrics together from one service and correlate them.	keyword	`elasticsearch`
`service.type`	Service type.	keyword
`service.state`	Current state of the service.	keyword
`service.version`	Version of the service the data was collected from. This allows to look at a data set only for a specific version of a service.	keyword	`3.2.4`
`service.ephemeral_id`	Ephemeral identifier of this service if one exists. This id compared to id normally changes across restarts.	keyword	`8a4f500f`

Source fields

Source fields describe details about the source of where the event is coming from.

Field	Description	Type
`source.ip`	IP address of the source. This can be on or multiple IPv4 or IPv6 addresses.	ip
`source.hostname`	Hostname of the source.	keyword
`source.port`	Port of the source.	long
`source.mac`	MAC address of the source.	keyword
`source.domain`	Source domain.	keyword
`source.subdomain`	Source subdomain.	keyword

URL fields

A complete URL, with scheme, host, and path.

The URL object can be reused in other prefixes like host.url.* for example. It is important that whenever URL is used that the same structure is used.

url.href is a multi field which means the data is stored as keyword url.href and test url.href.analyzed. The advantage of this is that for running a query against only a part of the url still works without having to split up the URL in all its part on ingest time.

Based on whatwg URL definition: whatwg/url#337

Field	Description	Type	Multi Field	Example
`url.href`	href contains the full url. The field is stored as keyword. `href` is an analyzed field so the parsed information can be accessed through `href.analyzed` in queries.	keyword		`https://elastic.co:443/search?q=elasticsearch#top`
`url.href.analyzed`		text	1
`url.protocol`	The protocol of the request, e.g. "https:".	keyword
`url.hostname`	The hostname of the request, e.g. "example.com". For correlation the this field can be copied into the `host.name` field.	keyword
`url.port`	The port of the request, e.g. 443.	keyword
`url.pathname`	The path of the request, e.g. "/search".	text
`url.pathname.raw`	The url path. This is a non-analyzed field that is useful for aggregations.	keyword	1
`url.search`	The search describes the query string of the request, e.g. "q=elasticsearch".	text
`url.search.raw`	The url search part. This is a non-analyzed field that is useful for aggregations.	keyword	1
`url.hash`	The hash of the request URL, e.g. "top".	keyword
`url.username`	The username of the request.	keyword
`url.password`	The password of the request.	keyword
`url.extension`	The url extension field contains the extension of the file associated with the url. A simple example is `http://localhost/logo.png` where the extension would be `png`. There can also be more complex cases like `http://localhost/content?asset=logo.png&token=XYZ` where the extension could also be `png` but depends on the implementation. The `extension` field should be left out if the extension is not defined.	keyword		`png`

User fields

The user fields are used to describe user information as part of the event.

All fields in user can have one or multiple entries. If a user has more then one id, an array with the ids must be provided.

Field	Description	Type
`user.id`	One or multiple unique identifiers of the user.	keyword
`user.name`	Name of the user. As the field is a keyword, the field will not be tokenized.	keyword
`user.email`	User email address.	keyword
`user.hash`	Unique user hash to correlate information for a user in anonymized form. This is useful in case `user.id` or `user.name` cannot be used because it contains confidential information.	keyword

User agent fields

The user_agent fields are normally coming from a browser request.

These are common to show up in web service logs coming from the parsed user agent string.

Field	Description	Type	Example
`user_agent.raw`	Unparsed version of the user_agent.	text
`user_agent.device`	The name of the physical device.	keyword
`user_agent.version`	Version of the physical device.	keyword
`user_agent.major`	The major version of the user agent.	long
`user_agent.minor`	The minor version of the user agent.	long
`user_agent.patch`	The patch version of the user agent.	keyword
`user_agent.name`	The name of the user agent.	keyword	`Chrome`
`user_agent.os.name`	The name of the operating system.	keyword
`user_agent.os.version`	Version of the operating system.	keyword
`user_agent.os.major`	The major version of the operating system.	long
`user_agent.os.minor`	The minor version of the operating system.	long
`user_agent.os.name`	The name of the operating system.	keyword

Use cases

Below are some examples that demonstrate how ECS fields can be applied to specific use cases.

Implementing ECS

Adhere to ECS

The following rules apply if an event wants to adhere to ECS

The document MUST have the @timestamp field.
The data type defined for an ECS field MUST be used.
It SHOULD have the field event.version to define which version of ECS it uses.

To make the most out of ECS as many fields as possible should be mapped to ECS.

Rules

ECS follows the following writing and naming rules for the fields. The goal of these rules is to make the fields easy to remember and have a guide when new fields are added.

Often events will contain additional fields besides ECS. These can follow the the same naming and writing rules but don't have to.

Writing

All fields must be lower case
No special characters except _
Words are combined through underscore

Naming

Use present tense unless field describes historical information.
Use singular and plural names properly to reflect the field content. For example, use requests_per_sec rather than request_per_sec.
Organise the prefixes from general to specific to allow grouping fields into objects with a prefix like host.*.
Avoid stuttering of words. If part of the field name is already in the prefix, do not repeat it. Example: host.host_ip should be host.ip.
Fields must be prefixed except for the base fields. For example all host fields are prefixed with host.. See dot notation in FAQ for more details.
Do not use abbreviations (few exceptions like ip exist)

About ECS

Scope

The Elastic Common Schema defines a common set of document fields (and their respective field names) to be used in event messages stored in Elasticsearch as part of any logging or metrics use case of the Elastic Stack, including IT operations analytics and security analytics.

Goals

The ECS has the following goals:

Correlate data between metrics, logs and APM
Correlate data coming from the same machines / hosts
Correlate data coming from the same service

Priority on which fields are added is based on these goals.

Benefits

The benefits to a user adopting these fields and names in their clusters are:

Ability to simply correlate data from different data sources
Improved ability to remember commonly used field names (since there is only a single set, not a set per data source)
Improved ability to deduce unremembered field names (since the field naming follows a small number of rules with few exceptions)
Ability to re-use analysis content (searches, visualizations, dashboards, alerts, reports, and ML jobs) across multiple data sources
Ability to use any future Elastic-provided analysis content in their environment without modifications

FAQ

Why is ECS using a dot nation instead of an underline notation?

There are two common formats on how keys are formatted when ingesting data into Elasticsearch:

Dot notation: user.firstname: Nicolas, user.lastname: Ruflin
Underline notation: user_firstname: Nicolas, user_lastname: Ruflin

In ECS the decision was made to use the dot notation and this entry is intended to share some background on this decision.

What is the difference between the two notations?

When ingesting user.firstname and user.lastname it is identical to ingesting the following JSON:

"user": {
  "firstname": "Nicolas",
  "lastname": "Ruflin"
}

This means internally in Elasticsearch user is represented as an object datatype. In the case of the underline notation both are just string datatypes.

NOTE: ECS does not used nested datatypes which is an array of objects.

Advantages of dot notation

The advantage of the dot notation is that on the Elasticsearch side each prefix is an object. Each object can have parameters on how fields inside the object should be treated, for example if they should be index or mappings should be extended. In the context of ECS this allows for example to disable dynamic property creation for certain prefixes.

On the ingest side of Elasticsearch it makes it simpler to for example drop complete objects with the remove processor instead of selecting each key inside it. It does not require prior knowledge which keys will end up in the object.

On the event producing side like in Beats it simplifies the creation of the events as on the code side each object can be treated as an object (or struct in Golang as an example) which makes constructing and modifying each part of the final event easier.

Disadvantage of dot notation

In Elasticsearch each key can only have one type. So if user is an object it's not possible to have in the same index user as type keyword like {"user": "nicolas ruflin"}. This can be an issue in certain datasets.

For the ECS data itself this is not an issue as all fields are predefined.

What if I already use the underline notation?

It's not a problem to mix the underline notation with the ECS do notation. They can coexist in the same document as long as there are not conflicts.

I have conflicting fields with ECS?

Assuming you already have a field user but ECS uses user as an object, you can use the rename processor on ingest time to rename your field to either the matching ECS field or rename it to user.value instead if your field does not match ECS.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
docs		docs
schemas		schemas
scripts		scripts
use-cases		use-cases
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Makefile		Makefile
README.md		README.md
TODO.md		TODO.md
schema.csv		schema.csv
template.json		template.json

ruflin/ecs

Folders and files

Latest commit

History

Repository files navigation

Elastic Common Schema (ECS)

Fields

Base fields

Agent fields

Cloud fields

Container fields

Destination fields

Device fields

Error fields

Event fields

File fields

Geoip fields

Host fields

Kubernetes fields

Log fields

Network fields

Organization fields

Process fields

Service fields

Source fields

URL fields

User fields

User agent fields

Use cases

Implementing ECS

Adhere to ECS

Rules

About ECS

Scope

Goals

Benefits

FAQ

Why is ECS using a dot nation instead of an underline notation?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages