-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Logs+] Add pipeline that parses JSON log events into top-level fields #96083
Conversation
…pipelines are added
…nto default-timestamp-for-logs
Right, it's "just" mapping issues that could happen as a result of adding more fields to the mapping that are parsed out of the JSON message.
What I meant is that the JSON may contain conflicting keys, such as Maybe this is what you mean but we could for now already add the JSON pipeline with the registry but not call it from the It would be a trivial change to call the JSON pipeline by default once we have more safety measures in place. |
Yes, that is what I meant. For logs users |
Co-authored-by: Felix Barnsteiner <[email protected]>
Co-authored-by: Felix Barnsteiner <[email protected]>
Maybe we can split this up into 2 phases. First have this "json" pipeline in place. Think of it like the ecs templates that can easily embeded and we keep optimising. In a second phase, we can enable it by default. By then, the pipeline is also tested. Or instead, we might make it a config option on the data stream. |
@@ -0,0 +1,48 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to discuss the naming of this pipeline. Because at the moment it would conflict with a pipeline for the dataset: json
and namespace: pipeline
. We need to come up with a non conflicting naming convention for these global asset. Ideally the component templates and pipelines follow the same logic. @kpollich @joshdover You might have ideas here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of this being a separate pipeline at all? Are we going to be reusing it elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joshdover I think this separation just proved to be useful with the concerns raised by @felixbarny - it will now allow us to remove it from the default pipeline, but still allow users to easily opt in by calling it from the logs@custom
pipeline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Demonstrated now in this PR's test
I think I don't have any more input for this.
|
Let's merge it but make it opt-in for now.
Taking inspiration from pipeline names used in the elastic/integrations repo. I'd suggest |
The pipeline is now disabled by default, with easy opt-in option, as the test shows.
I think it doesn't match very well to all other config files around it |
I think that's more of a theoretical concern, isn't it? Package-provided pipelines have the structure |
Not sure that is why I'm not comfortable with it. All the current integrations have a version prefixed, but what about the integrations that are built in Kibana? I rather be safe on this one. I expect over the coming months we keep adding some more reusable assets to Elasticsearch like ecs templates, other ingest pipelines. For a user, it should be very easy to understand that these are assets loaded by the system and globally available, ideally easy to remember names / convention. As this is the first asset of this kind that makes it in, we should come up with the convention. Here some thoughts:
|
As agreed offline- we'll change to |
Closes #95522
Note for reviewer
I added a validation for pipeline-dependencies, similar to the validation we have for composable index templates, so that a pipeline can be installed only if the pipelines it refers to are already installed.
Due to the complexities related to fully-automated resolution of pipeline dependencies, I simplified the solution by letting each registry manually define the required dependencies. I think this makes sense, as the pipelines required by another pipeline are part of the specific information known by the declaring registry, similar to the pipeline's ID and its related file.
NOTE that the validation doesn't care for versions of pipeline dependencies, so it is satisfied if the required pipelines are installed, based on their IDs, regardless of their version. I think this is a sufficient requirement, since it prevents race condition and it doesn't affect ingested data consistency in rolling upgrades (meaning- if the referred pipeline changes, ingested documents may be inconsistent either way).