WARNING: THIS IS WORK IN PROGRESS
The Elastic Common Schema (ECS) is used to provide a common data model when ingesting data into Elasticsearch. Having a common schema allows you correlate data from sources like logs and metrics or IT operations analytics and security analytics.
ECS is still under development and backward compatibility is not guaranteed. Any feedback on the general structure, missing fields, or existing fields is appreciated. For contributions please read the Contributing Guide.
The current version of ECS is 0.1.0
.
List of available ECS fields.
- Base fields
- Agent fields
- Cloud fields
- Container fields
- Destination fields
- Device fields
- Error fields
- Event fields
- File fields
- Geoip fields
- Host fields
- Kubernetes fields
- Log fields
- Network fields
- Organization fields
- Process fields
- Service fields
- Source fields
- URL fields
- User fields
- User agent fields
The base set contains all fields which are on the top level without a namespace.
These are fields which are common across all types of events.
The agent fields contains the data about the agent/client/shipper that created the event.
As an example in case of Beats for logs the agent.name
is filebeat
. In the case of APM it is the agent running in the app / service. The agent information does not change if data is sent through queuing system like Kafka, Redis, or processing systems like Logstash or APM Server.
All fields related to the cloud or infrastructure the events are coming from.
In case Metricbeat is running on an EC2 host and fetches data from its host, the cloud info is expected to contain the data about this machine. In the case Metricbeat runs outside the cloud on a remote machine and fetches data from a service running in the cloud it is expected to have the cloud data from the machine on which the service is running in.
Container fields are used for meta information about the specific container the information is coming from. This should help to correlate data based containers from any runtime.
Destination fields describe details about the destination of a packet/event.
Device fields are used to give additional information about the device that the information is coming from.
This could be a firewall, network device, etc.
Error namespace
This can be used to represent all kinds of errors. It can be for errors that happen while fetching events or if the event itself contains an error.
Field | Description | Type | Multi Field | Example |
---|---|---|---|---|
error.id |
Unique identifier for the error. | keyword | ||
error.message |
Error message. | text | ||
error.code |
Error code describing the error. | keyword |
The event fields are used for context information about the data itself.
File attributes.
Geoip fields are for used for geo information for an ip address.
The conversion to geoip information can be done by the Elasticsearch geoip plugin.
All fields related to a host. A host can be a physical machine, a virtual machine, and also a Docker container.
Normally the host information is related to the machine on which the event was generated / collected but also can be used differently if needed.
Kubernetes fields are used for meta information about k8s. This should help to correlate data coming out of k8s setups.
Fields which are specific to log events.
All fields related to network data.
The organization namespace can be used to enrich data with information from which organization the data belongs.
This can be useful if data should stored in the same index should be sometimes filtered or organized by one or multiple organizations.
Field | Description | Type | Multi Field | Example |
---|---|---|---|---|
organization.name |
Organization name. | text | ||
organization.id |
Unique identifier for the organization. | keyword |
These fields contain information about a process.
If metrics information is collected for a process and a process id / name shows up in a log message, these fields should help to correlated the two. It is expected that the process.pid
will often also stay in the metric itself and only copied to the global field for correlation.
The service fields describe the service for / from which the data was collected.
If logs or metrics are collected from Redis, service.name
would be redis
. This allows to find and correlate logs for a specific service and even version with service.version
.
Source fields describe details about the source of where the event is coming from.
A complete URL, with scheme, host, and path.
The URL object can be reused in other prefixes like host.url.*
for example. It is important that whenever URL is used that the same structure is used.
url.href
is a multi field which means the data is stored as keyword url.href
and test url.href.analyzed
. The advantage of this is that for running a query against only a part of the url still works without having to split up the URL in all its part on ingest time.
Based on whatwg URL definition: whatwg/url#337
The user fields are used to describe user information as part of the event.
All fields in user can have one or multiple entries. If a user has more then one id, an array with the ids must be provided.
The user_agent fields are normally coming from a browser request.
These are common to show up in web service logs coming from the parsed user agent string.
Below are some examples that demonstrate how ECS fields can be applied to specific use cases.
The following rules apply if an event wants to adhere to ECS
- The document MUST have the
@timestamp
field. - The data type defined for an ECS field MUST be used.
- It SHOULD have the field
event.version
to define which version of ECS it uses.
To make the most out of ECS as many fields as possible should be mapped to ECS.
ECS follows the following writing and naming rules for the fields. The goal of these rules is to make the fields easy to remember and have a guide when new fields are added.
Often events will contain additional fields besides ECS. These can follow the the same naming and writing rules but don't have to.
Writing
- All fields must be lower case
- No special characters except
_
- Words are combined through underscore
Naming
- Use present tense unless field describes historical information.
- Use singular and plural names properly to reflect the field content. For example, use
requests_per_sec
rather thanrequest_per_sec
. - Organise the prefixes from general to specific to allow grouping fields into objects with a prefix like
host.*
. - Avoid stuttering of words. If part of the field name is already in the prefix, do not repeat it. Example:
host.host_ip
should behost.ip
. - Fields must be prefixed except for the base fields. For example all
host
fields are prefixed withhost.
. Seedot
notation in FAQ for more details. - Do not use abbreviations (few exceptions like
ip
exist)
The Elastic Common Schema defines a common set of document fields (and their respective field names) to be used in event messages stored in Elasticsearch as part of any logging or metrics use case of the Elastic Stack, including IT operations analytics and security analytics.
The ECS has the following goals:
- Correlate data between metrics, logs and APM
- Correlate data coming from the same machines / hosts
- Correlate data coming from the same service
Priority on which fields are added is based on these goals.
The benefits to a user adopting these fields and names in their clusters are:
- Ability to simply correlate data from different data sources
- Improved ability to remember commonly used field names (since there is only a single set, not a set per data source)
- Improved ability to deduce unremembered field names (since the field naming follows a small number of rules with few exceptions)
- Ability to re-use analysis content (searches, visualizations, dashboards, alerts, reports, and ML jobs) across multiple data sources
- Ability to use any future Elastic-provided analysis content in their environment without modifications
There are two common formats on how keys are formatted when ingesting data into Elasticsearch:
- Dot notation:
user.firstname: Nicolas
,user.lastname: Ruflin
- Underline notation:
user_firstname: Nicolas
,user_lastname: Ruflin
In ECS the decision was made to use the dot notation and this entry is intended to share some background on this decision.
What is the difference between the two notations?
When ingesting user.firstname
and user.lastname
it is identical to ingesting the following JSON:
"user": {
"firstname": "Nicolas",
"lastname": "Ruflin"
}
This means internally in Elasticsearch user
is represented as an object datatype. In the case of the underline notation both are just string datatypes.
NOTE: ECS does not used nested datatypes which is an array of objects.
Advantages of dot notation
The advantage of the dot notation is that on the Elasticsearch side each prefix is an object. Each object can have parameters on how fields inside the object should be treated, for example if they should be index or mappings should be extended. In the context of ECS this allows for example to disable dynamic property creation for certain prefixes.
On the ingest side of Elasticsearch it makes it simpler to for example drop complete objects with the remove processor instead of selecting each key inside it. It does not require prior knowledge which keys will end up in the object.
On the event producing side like in Beats it simplifies the creation of the events as on the code side each object can be treated as an object (or struct in Golang as an example) which makes constructing and modifying each part of the final event easier.
Disadvantage of dot notation
In Elasticsearch each key can only have one type. So if user
is an object it's not possible to have in the same index user
as type keyword
like {"user": "nicolas ruflin"}
. This can be an issue in certain datasets.
For the ECS data itself this is not an issue as all fields are predefined.
What if I already use the underline notation?
It's not a problem to mix the underline notation with the ECS do notation. They can coexist in the same document as long as there are not conflicts.
I have conflicting fields with ECS?
Assuming you already have a field user but ECS uses user
as an object, you can use the rename processor on ingest time to rename your field to either the matching ECS field or rename it to user.value
instead if your field does not match ECS.