Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dissect tag on parsing error #8751

Merged
merged 2 commits into from
Oct 30, 2018
Merged

Dissect tag on parsing error #8751

merged 2 commits into from
Oct 30, 2018

Conversation

ph
Copy link
Contributor

@ph ph commented Oct 25, 2018

Before when a parsing error occurred the events was returned untouched
and an error was logged, if you don't look at your logs you have no
the idea that the tokenizer was not able to match your string.

Instead, when a parsing error occurs in the Dissect processor, we will now
add a tag named 'dissect_parsing_error' to the 'log.flags' field.
With that information, you are now able to reprocess your data or do
filtering on the UI.

Fixes: #8123

@ph ph added in progress Pull request is currently in progress. libbeat :Processors labels Oct 25, 2018
Before when a parsing error occurred the events was returned untouched
and an error was logged, if you don't look at your logs you have no
the idea that the tokenizer was not able to match your string.

Instead, when a parsing error occurs in the Dissect processor, we will now
add a tag named 'dissect_parsing_error' to the 'log.flags' field.
With that information, you are now able to reprocess your data or do
filtering on the UI.

Fixes: elastic#8123
@ph ph force-pushed the fix/dissect-add-flags branch from 074371a to 97e3e19 Compare October 25, 2018 17:12
@ph ph added review and removed in progress Pull request is currently in progress. labels Oct 25, 2018
CHANGELOG.asciidoc Outdated Show resolved Hide resolved
Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be consistent and call it flag everywhere in the code and docs.

@@ -27,6 +27,8 @@ import (
"github.com/elastic/beats/libbeat/processors"
)

const tagParsingError = "dissect_parsing_error"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also call it flag ?

libbeat/processors/dissect/processor.go Outdated Show resolved Hide resolved
@@ -176,3 +176,59 @@ func TestFieldAlreadyExist(t *testing.T) {
})
}
}

func TestErrorTagging(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flag

@ph
Copy link
Contributor Author

ph commented Oct 26, 2018

@ruflin Now with more flags (tm) :)

@@ -24,6 +24,9 @@ import (
"github.com/elastic/beats/libbeat/common"
)

// FlagField fields used to keep information or errors when events are parsed.
const FlagField = "log.flags"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if dissect should write it's flags into log.flags or rather event.flags? Reasons is that dissect is not only for logs but more generic.

Should have spotted this earlier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@webmat I think we need event.flags in the future in ECS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruflin Would it be the same for when an event is truncated?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did create elastic/ecs#100 a while ago for the log tag field. There is no issue yet for a more generic set of flags.

I agree with @ruflin that the dissect error should not be set on log.flags.

event.flags is a bit better. But I think this approach still mixes up pipeline & processing metadata with userland data (like the error discussion we had last week, @ruflin). The following idea hasn't been fleshed out yet, but I've been thinking we should introduce a section that's clearly about stuff that happened in the processing pipeline. E.g. pipeline.error, pipeline.tags (or flags), if someone wants to note down timings of each step in their pipeline, they'd do it under pipeline. as well, etc. However this will have to come after ECS 1.0/GA, so don't wait on this being defined for what needs to happen in Beats.

In the meantime, what I would suggest instead is to do what we've been doing for years, and add this dissect tag to tags directly, like Logstash does with _grok_parse_failure.

Copy link
Contributor

@webmat webmat Oct 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And @ph, to answer your more recent question, I would consider the truncation to be userland information, about the log itself. So I do think having truncated right on log.flags makes sense.

This is the new field where the multiline tag is also being added, correct? (Sorry I haven't been following these developments very closely)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's the same field correct.

Copy link
Contributor

@webmat webmat Oct 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks for confirming. So my opinion for now is that flags that are descriptive of the log itself or the log entry should be added to log.flags, so multiline, truncated, as they are now.

Parsing flags like dissect_parsing_error, on the other hand, should be added to tags, until we define a more general place to put pipeline errors and details.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To not block this PR, lets go with log.flags for now. Lets open a more general discussion where information from processing should go.

For tags in LS: We should probably also tackle this.

@ph ph merged commit 8dbfed2 into elastic:master Oct 30, 2018
@ph ph added the needs_backport PR is waiting to be backported to other branches. label Oct 30, 2018
ph added a commit to ph/beats that referenced this pull request Oct 30, 2018
Before when a parsing error occurred the events was returned untouched
and an error was logged, if you don't look at your logs you have no
the idea that the tokenizer was not able to match your string.

Instead, when a parsing error occurs in the Dissect processor, we will now
add a tag named 'dissect_parsing_error' to the 'log.flags' field.
With that information, you are now able to reprocess your data or do
filtering on the UI.

Fixes: elastic#8123
(cherry picked from commit 8dbfed2)
@ph ph added v6.6.0 and removed needs_backport PR is waiting to be backported to other branches. labels Oct 30, 2018
ph added a commit that referenced this pull request Nov 2, 2018
Before when a parsing error occurred the events was returned untouched
and an error was logged, if you don't look at your logs you have no
the idea that the tokenizer was not able to match your string.

Instead, when a parsing error occurs in the Dissect processor, we will now
add a tag named 'dissect_parsing_error' to the 'log.flags' field.
With that information, you are now able to reprocess your data or do
filtering on the UI.

Fixes: #8123
(cherry picked from commit 8dbfed2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants