[Logs+] Add pipeline that parses JSON log events into top-level fields #96083

eyalkoren · 2023-05-14T14:06:33Z

Note for reviewer

I added a validation for pipeline-dependencies, similar to the validation we have for composable index templates, so that a pipeline can be installed only if the pipelines it refers to are already installed.
Due to the complexities related to fully-automated resolution of pipeline dependencies, I simplified the solution by letting each registry manually define the required dependencies. I think this makes sense, as the pipelines required by another pipeline are part of the specific information known by the declaring registry, similar to the pipeline's ID and its related file.

NOTE that the validation doesn't care for versions of pipeline dependencies, so it is satisfied if the required pipelines are installed, based on their IDs, regardless of their version. I think this is a sufficient requirement, since it prevents race condition and it doesn't affect ingested data consistency in rolling upgrades (meaning- if the referred pipeline changes, ingested documents may be inconsistent either way).

…istry

…or-logs

…pipelines are added

…nto default-timestamp-for-logs

felixbarny · 2023-05-16T16:12:23Z

There shouldn't be parsing-related rejections.

Right, it's "just" mapping issues that could happen as a result of adding more fields to the mapping that are parsed out of the JSON message.

As long as we don't map anything explicitly to object types, does this PR increase risk related to subobjects: false?

What I meant is that the JSON may contain conflicting keys, such as "foo" and "foo.bar" or that some of the keys in the JSON conflict with metadata fields added by Filebeat, such as "host" vs "host.name".

Maybe this is what you mean but we could for now already add the JSON pipeline with the registry but not call it from the logs-*-* pipeline. So if users want to use it, they can "opt-in" by calling it from the logs@custom pipeline.

It would be a trivial change to call the JSON pipeline by default once we have more safety measures in place.

x-pack/plugin/core/src/main/resources/logs-json-pipeline.json

eyalkoren · 2023-05-17T04:05:08Z

Maybe this is what you mean but we could for now already add the JSON pipeline with the registry but not call it from the logs-- pipeline. So if users want to use it, they can "opt-in" by calling it from the logs@custom pipeline.

Yes, that is what I meant. For logs users ~~and for anyone else that is interested in this capability~~.
EDIT: sorry, this is specific to logs atm as it only looks for the message field. Maybe there's a way to generalize it somehow, but definitely out of scope at this stage.

Co-authored-by: Felix Barnsteiner <[email protected]>

ruflin · 2023-05-17T06:18:06Z

Maybe we can split this up into 2 phases. First have this "json" pipeline in place. Think of it like the ecs templates that can easily embeded and we keep optimising. In a second phase, we can enable it by default. By then, the pipeline is also tested. Or instead, we might make it a config option on the data stream.

ruflin · 2023-05-17T06:22:09Z

x-pack/plugin/core/src/main/resources/logs-json-pipeline.json

@@ -0,0 +1,48 @@
+{


We need to discuss the naming of this pipeline. Because at the moment it would conflict with a pipeline for the dataset: json and namespace: pipeline. We need to come up with a non conflicting naming convention for these global asset. Ideally the component templates and pipelines follow the same logic. @kpollich @joshdover You might have ideas here too.

What's the point of this being a separate pipeline at all? Are we going to be reusing it elsewhere?

@joshdover I think this separation just proved to be useful with the concerns raised by @felixbarny - it will now allow us to remove it from the default pipeline, but still allow users to easily opt in by calling it from the logs@custom pipeline

Demonstrated now in this PR's test

eyalkoren · 2023-05-17T07:11:54Z

I think I don't have any more input for this.
@ruflin @felixbarny let me know once you decide on the followings, so I can bring this to completion-

do we merge it, or block until other safety valves are in place?
if merging- do we disable it by default for now?
how to name the pipeline?

felixbarny · 2023-05-17T11:05:42Z

do we merge it, or block until other safety valves are in place?
if merging- do we disable it by default for now?

Let's merge it but make it opt-in for now.

how to name the pipeline?

Taking inspiration from pipeline names used in the elastic/integrations repo. I'd suggest pipeline-json-message.

…rse-logs

eyalkoren · 2023-05-17T11:54:58Z

The pipeline is now disabled by default, with easy opt-in option, as the test shows.
So we are only left with a decision about pipeline naming - @ruflin

Taking inspiration from pipeline names used in the elastic/integrations repo. I'd suggest pipeline-json-message.

I think it doesn't match very well to all other config files around it

felixbarny · 2023-05-17T13:44:56Z

Because at the moment it would conflict with a pipeline for the dataset: json and namespace: pipeline.

I think that's more of a theoretical concern, isn't it? Package-provided pipelines have the structure <type>-<dataset>-<version>, so I don't see the conflict with logs-json-pipeline. But if we still want to avoid that association, we should probably prefix pipelines that we set up in ES with pipeline-, such as pipeline-logs-json or pipeline-json-message.

ruflin · 2023-05-22T07:22:10Z

I think that's more of a theoretical concern, isn't it?

Not sure that is why I'm not comfortable with it. All the current integrations have a version prefixed, but what about the integrations that are built in Kibana? I rather be safe on this one.

I expect over the coming months we keep adding some more reusable assets to Elasticsearch like ecs templates, other ingest pipelines. For a user, it should be very easy to understand that these are assets loaded by the system and globally available, ideally easy to remember names / convention. As this is the first asset of this kind that makes it in, we should come up with the convention.

Here some thoughts:

We don't need the pipeline part in the name, it is already a pipeline
Should we prefix with the type these things are for? logs@json? If for all signals, signals@json? {type}@{processor}
- How do the ECS templates play into this? logs@ecs-core, ecs-core@signals. But this is broader? ecs@system, ecs@managed ?
The above should work for assets with a version identifier and without one. For ECS mappings, I expect these to have a version, but some base templates / pipelines not.

eyalkoren · 2023-05-22T14:30:51Z

As agreed offline- we'll change to logs@json-message and follow up with a proper definition of the builtin pipeline naming convention

eyalkoren added 21 commits May 3, 2023 15:23

Adding ability to auto-install ingest pipelines through index templates

8934c9e

Merge remote-tracking branch 'upstream/main' into ingest-pipeline-reg…

4aaa1ff

…istry

Update docs/changelog/95782.yaml

8d09963

Update changelog summary

0240b07

Guarding from nulls

5354b7c

Avoid using forbidden API

ddef98e

Fixing AnalyticsTemplateRegistryTests to pass index template validation

7ee1694

Merge remote-tracking branch 'upstream/main' into ingest-pipeline-reg…

e82e135

…istry

Fixing validation when IngestMetadata is null in cluster state

481f132

Merge remote-tracking branch 'upstream/main' into ingest-pipeline-reg…

3f89462

…istry

Merge remote-tracking branch 'upstream/main' into default-timestamp-f…

0688915

…or-logs

[Logs+] adding defalut pipeline for logs data streams

e361eb8

Update docs/changelog/95971.yaml

4cd492b

Fix StackTemplateRegistryTests and verify that StackTemplateRegistry …

1b9afc1

…pipelines are added

Merge remote-tracking branch 'eyalkoren/default-timestamp-for-logs' i…

ba72cdb

…nto default-timestamp-for-logs

Update changelog summary

0e8400c

Fixing CoreWithSecurityClientYamlTestSuiteIT

2eef78e

Verify that valid timestamp is not being overridden

011f627

Use _ingest.timestamp field directly in set processor

b711cb1

Add log events JSON parser to logs default pipeline

e4a3fc5

Merge remote-tracking branch 'upstream/main' into json-parse-logs

18fe484

eyalkoren added >enhancement :Data Management/Data streams Data streams and their lifecycles v8.9.0 labels May 14, 2023

eyalkoren requested review from jbaiera and felixbarny May 14, 2023 14:06

eyalkoren self-assigned this May 14, 2023

elasticsearchmachine added Team:Data Management Meta label for data/management team external-contributor Pull request authored by a developer outside the Elasticsearch team labels May 14, 2023

Update docs/changelog/96083.yaml

c901d5d

eyalkoren added 2 commits May 16, 2023 16:29

Make condition safe for non-string message field

a3a56b3

Remove explicit comparisons to true

99a1c32

felixbarny reviewed May 16, 2023

View reviewed changes

x-pack/plugin/core/src/main/resources/logs-json-pipeline.json Outdated Show resolved Hide resolved

x-pack/plugin/core/src/main/resources/logs-json-pipeline.json Outdated Show resolved Hide resolved

eyalkoren and others added 2 commits May 17, 2023 07:05

No null check for ctx

75f37aa

Co-authored-by: Felix Barnsteiner <[email protected]>

No null check for ctx

b8e001c

Co-authored-by: Felix Barnsteiner <[email protected]>

ruflin reviewed May 17, 2023

View reviewed changes

Merge remote-tracking branch 'upstream/main' into json-parse-logs

8f4b7c1

eyalkoren added 2 commits May 17, 2023 14:50

Making JSON parsing pipeline opt-in

3b91268

Merge remote-tracking branch 'eyalkoren/json-parse-logs' into json-pa…

bb67bd6

…rse-logs

Merge remote-tracking branch 'upstream/main' into json-parse-logs

c963143

eyalkoren added 2 commits May 22, 2023 17:11

Merge remote-tracking branch 'upstream/main' into json-parse-logs

0c766ff

Changing pipeline name

d16f9bd

eyalkoren requested review from ruflin and felixbarny May 22, 2023 14:32

felixbarny approved these changes May 22, 2023

View reviewed changes

eyalkoren merged commit 7d57731 into elastic:main May 23, 2023

eyalkoren deleted the json-parse-logs branch May 23, 2023 03:22

eyalkoren mentioned this pull request May 23, 2023

[Discussion] Naming convention for builtin pipelines #96267

Closed

felixbarny changed the title ~~[Logs+] Automatically parse JSON log events into top-level fields~~ [Logs+] Add pipeline that parses JSON log events into top-level fields Jun 7, 2023

felixbarny mentioned this pull request Jun 7, 2023

[Logs+] Enable JSON parsing for logs by default #96651

Open

ElenaStoeva mentioned this pull request Jun 9, 2023

[Ingest Pipelines] Fix functional tests elastic/kibana#159336

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Logs+] Add pipeline that parses JSON log events into top-level fields #96083

[Logs+] Add pipeline that parses JSON log events into top-level fields #96083

eyalkoren commented May 14, 2023

felixbarny commented May 16, 2023

eyalkoren commented May 17, 2023 •

edited

Loading

ruflin commented May 17, 2023

ruflin May 17, 2023

joshdover May 17, 2023

eyalkoren May 17, 2023

eyalkoren May 17, 2023

eyalkoren commented May 17, 2023 •

edited

Loading

felixbarny commented May 17, 2023

eyalkoren commented May 17, 2023

felixbarny commented May 17, 2023

ruflin commented May 22, 2023

eyalkoren commented May 22, 2023

[Logs+] Add pipeline that parses JSON log events into top-level fields #96083

[Logs+] Add pipeline that parses JSON log events into top-level fields #96083

Conversation

eyalkoren commented May 14, 2023

Note for reviewer

felixbarny commented May 16, 2023

eyalkoren commented May 17, 2023 • edited Loading

ruflin commented May 17, 2023

ruflin May 17, 2023

Choose a reason for hiding this comment

joshdover May 17, 2023

Choose a reason for hiding this comment

eyalkoren May 17, 2023

Choose a reason for hiding this comment

eyalkoren May 17, 2023

Choose a reason for hiding this comment

eyalkoren commented May 17, 2023 • edited Loading

felixbarny commented May 17, 2023

eyalkoren commented May 17, 2023

felixbarny commented May 17, 2023

ruflin commented May 22, 2023

eyalkoren commented May 22, 2023

eyalkoren commented May 17, 2023 •

edited

Loading

eyalkoren commented May 17, 2023 •

edited

Loading