Add event.ingested as the ingest timestamp #582

cwurm · 2019-10-09T20:55:55Z

Especially in a security use case, the ingest timestamp is important. Unlike @timestamp which contains the timestamp when the event originally occurred (e.g. a process was started), event.ingested is to contain the timestamp when an event arrived in the central data store (usually Elasticsearch).

This is an important timestamp to have since event and ingest can be far apart for various reasons:

An endpoint device (e.g. laptop) is running and collecting data, but is unable to send it (e.g. because it is being used offline for a while, e.g. on a plane).
The ingestion pipeline is backed up, or temporarily broken (e.g. too many events are coming in, so backpressure is applied through the system and events are spooled to disk in a messaging queue like Redis or Kafka, to be processed at a later time when the peak is over).
An attacker has gained control over a machine and has changed the system time to the distant past or future to hide security events from this machine from the usual scrutiny (e.g. queries and dashboards looking at the last N minutes/hours of data only).

Having the ingest timestamp in addition to the event timestamp allows making sure that all data is processed as it arrives into Elasticsearch (e.g. by a scheduled query on the ingest timestamp).

It will also allow to run analysis on the relation between event and ingest timestamp, e.g. find events where the event timestamp is after the ingest timestamp, or significantly before it. This is useful for both security purposes (e.g. to find attackers manipulating system time) as well as operational (e.g. there might be a problem in the ingest pipeline, an endpoint might be misconfigured, or an NTP server is having a bad day).

The easiest way to fill the ingest timestamp is to use an ingest processor in Elasticsearch like this (docs):

"set": {
  "field": "event.ingested",
  "value": "{{_ingest.timestamp}}"
}

This PR is adding event.ingested as a core field - the expectation being that it should be filled by all data sources (esp. security/auditing data sources) so it's possible to run any queries depending on seeing all data on it (esp. infosec queries).

willemdh · 2019-10-10T11:31:59Z

@cwurm So together with event.created in a normal situation this would be the order of the timestamps:

@timestamp => This is the date/time extracted from the event
event.created => Contains the date/time when the event was first read by an agent, or by your pipeline.
event.ingested => Contains the timestamp when an event arrived in elasticsearch

Just some thoughts about a Logstash pipeline I'm using. What if an event only has second granulaity. As the source of those events spawn a huge amount of events, I need to use the @timestamp generated by elasticsearch, which has milliseconds granularity, and not timestamp in the original event. Do you also consider this auto generated @timestamp to be put in event.ingested? Or in event.created?

cwurm · 2019-10-11T11:19:20Z

@timestamp => This is the date/time extracted from the event
event.created => Contains the date/time when the event was first read by an agent, or by your pipeline.
event.ingested => Contains the timestamp when an event arrived in elasticsearch

@willemdh Correct. Out of these three, I personally see @timestamp and event.ingested as essential, event.created maybe less so. As it stands, few data sources seem to fill event.created (I see it in Winlogbeat, Filebeat suricata.eve, Filebeat googlecloud.vpcflow).

Are you collecting it? Do you see a lot of value in it?

Just some thoughts about a Logstash pipeline I'm using. What if an event only has second granulaity. As the source of those events spawn a huge amount of events, I need to use the @timestamp generated by elasticsearch, which has milliseconds granularity, and not timestamp in the original event. Do you also consider this auto generated @timestamp to be put in event.ingested? Or in event.created?

I'm not sure I follow. What do you mean by "second granularity"?

willemdh · 2019-10-11T13:16:25Z

@cwurm

Are you collecting it? Do you see a lot of value in it?

Actually I do use event.created in multiple custom made pipelines, but maybe not always in the way it was meant to be..

With second granularity I mean that the log source is sometimes not able to log the timestamp with milliseconds, for example:

2019-10-11 14:56:23

When that is the case with high speed logging sources, I need to use the @timestamp created at ingestion by elasticsearch, otherwise all events within a second have the same @timestamp
This is the case for example in our F5 BIG-IP syslog. For example:

<134>Oct 11 15:02:00 slot1/f5extpr info tmm[12871]: Authplatform 2.0 (Auth_Platform_2_dispatch Policy Selection) 1570794527 10.22.61.82:50607 Data Group List lookup for 'ap.stad.be' yielded 'weak' as result

So where do I put the Oct 11 15:02:00 then?

I can't put it in @timestamp as I put the auto generated timestamp (with millisec granularity) there. But I still need it, so I created f5.timestamp. An ecs place to put the original log timestamp would be useful too imho.

Also what if I am able to use the raw log timestamp, but I'd like to keep the auto generated @timestamp? For example for debugging what's the latency between the timestamp the log was created and the timestamp it was finally inserted into es.. When playing around with logstash workers for high speed logging sources (such as a perimeter Palo Alto fw), that can be quite useful.

Sorry if I was unclear. Imho there are even more usable timestamps and we should have a place for all of them.

log.timestamp => original raw log timestamp

log.syslog.timestamp => orginal raw log syslog timestamp (yes, I do have multiple log sources which have 2 timestamps, such as Brocade syslog and ESXi)

Example:

<190>Oct 11 14:21:57 MYSASWITCH raslogd: AUDIT, 2019/10/11-14:21:56 (CEST), [SRM-1001], INFO, RAS, NONE/root/NONE/None/CLI, ad_0/BrocadeG620/CHASSIS, 8.2.1a, , , , , , , System is alive.

In the above case Oct 11 14:21:57 would go into log.syslog.timestamp an 2019/10/11-14:21:56 (CEST) would go into log.timestamp

event.created => Contains the date/time when the event was first read by an agent, or by your pipeline. But what if for example Filebeat is sending to a Logstash pipeline? I can imagine reasons you want to know when Filebeat picked it up, but also when Logstash starts the called pipeline?

event.ingested => Contains the timestamp when an event arrived in elasticsearch

Imho having an ECS timestamp for every piece in the ingestion process, would clear up confusion for everyone.

Grtz

cwurm · 2019-10-14T16:31:46Z

@willemdh Thanks for the details!

When that is the case with high speed logging sources, I need to use the @timestamp created at ingestion by elasticsearch, otherwise all events within a second have the same @timestamp

Makes sense. I think that's exactly where event.ingested would be used. @timestamp can be from the original event (even if not precise), then the user can choose between looking at event time (where several events might share the same timestamp because of missing granularity) or ingestion time (which should be more unique - though still not 100% in a high-frequency data source with sub-millisecond differences between subsequent events).

event.created => Contains the date/time when the event was first read by an agent, or by your pipeline. But what if for example Filebeat is sending to a Logstash pipeline? I can imagine reasons you want to know when Filebeat picked it up, but also when Logstash starts the called pipeline?

I can see how it can be useful to know when each event was at each step of the ingest pipeline. I think it would be hard to define these so they work well in most cases - many ingestion pipelines have many steps, e.g. Beat -> Logstash -> Kafka -> Logstash -> Elasticsearch is a lot of timestamps.

In the "standard" case, I think @timestamp for the original event time and event.ingested for the time when it was indexed into Elasticsearch will often be enough. Any other timestamp fields filled somewhere during ingestion can be added as custom fields on a case-by-case basis as needed.

willemdh · 2019-10-23T10:36:42Z

Adding event.ingested is definitely needed, please continue with this pr. :)

But I still believe there should be an official syslog timestamp field. It's the only missing field to match all syslog related info now we finally have cleared up the syslog priority / facility / severity. So something like log.syslog.timestamp

I'd make a new GitHub issue for that, but if noone else cares then maybe I shouldn't.. 😃

jordansissel · 2019-10-30T16:09:02Z

What if an event only has second granulaity. As the source of those events spawn a huge amount of events, I need to use the @timestamp generated by elasticsearch

In this scenario, @timestamp is the same as event.created because you are not using the original log's declared timestamp.

willemdh · 2019-10-30T20:04:01Z

@jordansissel Not necessarily. As event.created is described as the time when the event was first read by an agent or by a pipeline. Some beats modules add the field, but I'm not adding that in my Logstash pipeline with 'now', so the ingest timestamp is my best option. @timestamp would be equal to event.ingested (if it existed yet)...

cwurm · 2019-11-06T19:25:43Z

@webmat The PR to fill event.ingested in Libbeat is progressing (elastic/beats#14001) - would be good to get feedback on this soon.

webmat

Thanks for the PR @cwurm, as usual thanks for the detailed thoughts @willemdh :-) An additional issue to discuss some more about original timestamp granularity is welcome, thanks for bringing that to our attention.

Adding event.ingested is definitely needed, please continue with this pr

Agreed, I don't think there's any outstanding issues with the PR, other than my comment below.

In a surprising turn of events, Christoph will be out for a little bit, and he asked me to take over. So I'm providing feedback on a PR that I will finish up myself 😂

I'll see if I have access to push to Christoph's branch, I think not. I may have to start from this PR and create a new one.

webmat · 2019-11-13T21:15:21Z

schemas/event.yml

+      short: Ingest timestamp
+      description: >
+        Time when the event was ingested. This is different from `@timestamp`
+        which is when the event originally occurred.


I think we should be a little more explicit in the relationship with event.created as well here.

Here's what I'm thinking:

short: Timestamp when an event arrived in the central data store description: > Timestamp when an event arrived in the central data store. This is different from `@timestamp`, which is when the event originally occurred. It's also different from `event.created`, which is meant to capture the first time an agent saw the event. In normal conditions, assuming no tampering, the timestamps should chronologically look like this: `@timestamp` < `event.created` < `event.ingested`.

Just like @timestamp and event.created, the description is pretty verbose in order to clarify the relationship. I think we should fast-track this PR and we may not necessarily need to adjust @timestamp and event.created descriptions for now.

webmat · 2019-11-13T21:48:30Z

@andrewkroh With Christoph out, can I elect you to double-check me on finishing up his PR? I think this is ready to merge if there's nothing major sticking out.

This relates to the Beats PR elastic/beats#14001, so cc @tsg as a reviewer on the Beats PR.

ruflin

LGTM

ruflin · 2019-11-14T09:18:20Z

schemas/event.yml

@@ -276,3 +277,18 @@

        This is mainly useful if you use more than one system that assigns
        risk scores, and you want to see a normalized value across all systems.
+
+    - name: ingested
+      level: core


I would like to introduce it directly as core. From the feedback we've seen, this timestamp is considered more useful than event.created.

A few reasons why that is so:

It's another system's timestamp, which can help detect tampering of the clock on the monitored machine

It can also be used to detect slowdowns in the overall pipeline, assuming no tampering

PRs such as elastic/beats#14001 could also help populating it broadly and reliably, without having to revisit all modules or all beats.

If you have strong feelings and would really prefer to start by introducing as extended, I can go with that, in order to get this in quickly. But I think it would send the wrong message wrt to this timestamp's importance vs event.created.

I'll wait for your response on this, and would like to merge this tomorrow if possible

Add event.ingested as the ingest timestamp.

ed86967

cwurm requested review from webmat and MikePaquette October 9, 2019 20:55

Changelog

443da9b

cwurm mentioned this pull request Oct 10, 2019

Setup: Add default pipeline to templates elastic/beats#14001

Closed

webmat reviewed Nov 13, 2019

View reviewed changes

Mathieu Martin added 3 commits November 13, 2019 16:33

spacing tweak in changelog

18b1d0c

Flesh out the description wrt related timestamps

8e5023e

Let's set the example by adding examples :-)

b536afc

webmat requested review from andrewkroh and removed request for MikePaquette November 13, 2019 21:48

ruflin approved these changes Nov 14, 2019

View reviewed changes

Merge branch 'master' into event_ingested

9f41717

webmat removed the request for review from andrewkroh November 19, 2019 15:53

webmat merged commit 784b31a into elastic:master Nov 19, 2019

faec mentioned this pull request Dec 3, 2019

Proposal: change default time key from @timestamp to @ingestionTime elastic/beats#14878

Closed

This was referenced Dec 13, 2019

Support for event.ingested and event.processed #453

Closed

Add ingest timestamp to pipeline elastic/apm-server#2934

Closed

dcode pushed a commit to dcode/ecs that referenced this pull request Apr 15, 2020

Add event.ingested as the ingest timestamp (elastic#582)

f2bef4a

spong mentioned this pull request Jul 20, 2020

[Meta] Expand support for populating event.ingested elastic/beats#20073

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add event.ingested as the ingest timestamp #582

Add event.ingested as the ingest timestamp #582

cwurm commented Oct 9, 2019

willemdh commented Oct 10, 2019

cwurm commented Oct 11, 2019

willemdh commented Oct 11, 2019 •

edited

Loading

cwurm commented Oct 14, 2019

willemdh commented Oct 23, 2019

jordansissel commented Oct 30, 2019

willemdh commented Oct 30, 2019

cwurm commented Nov 6, 2019

webmat left a comment

webmat Nov 13, 2019

webmat commented Nov 13, 2019

ruflin left a comment

ruflin Nov 14, 2019

webmat Nov 14, 2019

webmat Nov 14, 2019

ruflin Nov 19, 2019

Add event.ingested as the ingest timestamp #582

Add event.ingested as the ingest timestamp #582

Conversation

cwurm commented Oct 9, 2019

willemdh commented Oct 10, 2019

cwurm commented Oct 11, 2019

willemdh commented Oct 11, 2019 • edited Loading

cwurm commented Oct 14, 2019

willemdh commented Oct 23, 2019

jordansissel commented Oct 30, 2019

willemdh commented Oct 30, 2019

cwurm commented Nov 6, 2019

webmat left a comment

Choose a reason for hiding this comment

webmat Nov 13, 2019

Choose a reason for hiding this comment

webmat commented Nov 13, 2019

ruflin left a comment

Choose a reason for hiding this comment

ruflin Nov 14, 2019

Choose a reason for hiding this comment

webmat Nov 14, 2019

Choose a reason for hiding this comment

webmat Nov 14, 2019

Choose a reason for hiding this comment

ruflin Nov 19, 2019

Choose a reason for hiding this comment

willemdh commented Oct 11, 2019 •

edited

Loading