[meta] Add support for pipeline details in ECS #940

webmat · 2020-08-18T19:22:24Z

We'd like to define how to capture pipeline details in ECS.

Pipelines can come in many shapes

Agent to Elasticsearch
All the way to Syslog=> agent => Logstash => Queue => Logstash => Elasticsearch ingest node => Elasticsearch

We'd like to define a way or at least guidance on how to capture information about various kinds of pipelines. The information folks usually want to track can fall into various categories, usually across each step of their pipelines:

Technology (product, version)
Host name, address
Timing: plain timestamps at each step, processing duration per step
Pipeline name that processed the event
Pipeline error handling (where should information be populated within an ECS event?)

Past discussions around this:

#8, #40, #76, #154, #315, #453, #700, #730, #1027 , #1059

dainperkins · 2020-11-17T18:26:40Z

adding agent.ip, and maybe even agent.hostname seems like a good plan to allow for full identification of e.g. the host filebeat is running on in e.g. a syslog or API pull scenario.

While I've used agent to describe logstash (e.g. for netflow/syslog data) it seems like another field set to identify a "collector", and possibly a "queue" might make sense for end to end descriptions of all entities involved in data ingest?

Tho that would not easily solve the dual logstash setup with an agent (assuming the agent -> logstash hop is necessary) which would suggest an array of hosts/ips?

ypid-geberit · 2021-09-06T18:10:20Z

I am proposing something like agent.config_version to be added as well. Background: Pipeline config should be tracked using git. When the parsing is changed, a similar log event might then be parsed differently without a reason for the end user (in case they look at event.original and what fields are populated. It could be interesting, at least when developing/testing new log types/pipelines, to communicate the config/pipeline version to (test) users. As the value, I would suggest the output from git rev-parse --short HEAD as an example.

rsk0 · 2022-08-12T19:42:55Z

Articulated Arbitrarily-Structured Pipeline Details

Regarding timing, and considering just the example of arrival timestamps...

It's very important to be able to see latency throughout logging infrastructure. ECS currently offers only three timestamps for this purpose: event occurrence (@timestamp), record picked up by pipeline (event.created (a confusing name)), and record received into the data store (event.ingested). This small set of fields is too coarse and inflexible for best handling of sophisticated or large-scale operations.

In order to support arbitrary pipeline arrangements, what if we had two related arrays, one for juncture name and one for juncture arrival time?

@timestamp = 1660329591000
event.created = 1660329592000
foo.pipeline.junctures = [ "fluent bit source", "kafka amsterdam", "message classification processor", "logstash us-east-1" ]
foo.pipeline.arrivals = [ 1660329593000, 1660329594000, 1660329595000, 1660329596000 ]
event.ingested = 1660329597000

Alternatively, a list of objects.

@timestamp = 1660329591000
event.created = 1660329592000
foo.pipeline.times = [
  { "junction_name": "fluent bit source", "arrival_time": 1660329593000 },
  { "junction_name": "kafka amsterdam", "arrival_time": 1660329594000 },
  { "junction_name": "message classification processor", "arrival_time": 1660329595000 },
  { "junction_name": "logstash us-east-1", "arrival_time": 1660329595000 }
]
event.ingested = 1660329597000

Either way, I don't think Kibana would be able to visualize this data? Still, the information would be there for operators to use via other methods.

ECS's Values-Agnostic Philosophy

I think ECS development so far has been shying away from specification of values, the content in fields, and I understand this is important for fostering adoption by being open to various sources / implementations. (Let me know if I'm reading things correctly?) However, if that's a firm philosophical stance for ECS development, I suspect the issue of articulated pipeline details recording can't then be well handled via ECS per se.

I think there's an interoperability cost if ECS doesn't at least make recommendations about values. As a specific example, my company is having to devise its own idea of severity level values and likely won't do a better job than numerous companies in collaboration around ECS, and certainly can't be as effective as encouraging broad adoption of the scheme and thus interoperability as Elastic/ECS would be.

Non-binding recommendations about values could continue ECS's non-specificity tack, avoiding blocking adoption, while at the same time helping foster interoperability via a kind of "proto-standard". I'm thinking something like RFC "MAY" / "OPTIONAL".

log.level: The textual severity level of the original event. It can be whatever the heck you please. As one possibility, it could be -- no pressure -- it could be these values with these meanings, just some ideas, we don't know, whatever floats your boat, friend:
"trace": blah
"debug": blah
"information": blah
...

Maybe, if ECS stewardship wants to maintain a firm stance on values agnosticism, we could benefit from a consortium of ECS-using companies developing a values recommendation addendum to ECS.

rsk0 · 2023-08-17T21:17:04Z

Timing: plain timestamps at each step, processing duration per step

Just a note that issue 1059 (roughly "fix event.ingested and event.created fields") rolls up into this issue -- to make sure those bugs get addressed when this issue is.

webmat added meta RFC:candidate labels Aug 18, 2020

This was referenced Aug 18, 2020

Add a logstash_id field - Feature request #730

Closed

Support for event.ingested and event.processed #453

Closed

pipeline object? #315

Closed

ebeahan mentioned this issue Oct 15, 2020

Add agent.ip #1027

Closed

rsk0 mentioned this issue Oct 28, 2020

event fields confuse what "event" refers to #1059

Closed

ebeahan mentioned this issue Apr 20, 2021

Making clear about to what device is host object related to #76

Closed

ypid-geberit mentioned this issue Apr 22, 2021

Document as-is field log.flags used by Beats #1379

Closed

rsk0 mentioned this issue Aug 20, 2023

new field set: for log documents themselves #2258

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[meta] Add support for pipeline details in ECS #940

[meta] Add support for pipeline details in ECS #940

webmat commented Aug 18, 2020 •

edited by ebeahan

Loading

dainperkins commented Nov 17, 2020

ypid-geberit commented Sep 6, 2021 •

edited

Loading

rsk0 commented Aug 12, 2022 •

edited

Loading

rsk0 commented Aug 17, 2023

[meta] Add support for pipeline details in ECS #940

[meta] Add support for pipeline details in ECS #940

Comments

webmat commented Aug 18, 2020 • edited by ebeahan Loading

dainperkins commented Nov 17, 2020

ypid-geberit commented Sep 6, 2021 • edited Loading

rsk0 commented Aug 12, 2022 • edited Loading

Articulated Arbitrarily-Structured Pipeline Details

ECS's Values-Agnostic Philosophy

rsk0 commented Aug 17, 2023

webmat commented Aug 18, 2020 •

edited by ebeahan

Loading

ypid-geberit commented Sep 6, 2021 •

edited

Loading

rsk0 commented Aug 12, 2022 •

edited

Loading