Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dissect tag on parsing error #8751

Merged
merged 2 commits into from
Oct 30, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,7 @@ https://github.com/elastic/beats/compare/v6.4.0...master[Check the HEAD diff]
- Add Beats Central Management {pull}8559[8559]
- Allow Bus to buffer events in case listeners are not configured. {pull}8527[8527]
- Enable `host` and `cloud` metadata processors by default. {pull}8596[8596]
- Dissect will now flag event on parsing error. {pull}8751[8751]

*Auditbeat*

Expand Down
3 changes: 3 additions & 0 deletions libbeat/beat/event.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,9 @@ import (
"github.com/elastic/beats/libbeat/common"
)

// FlagField fields used to keep information or errors when events are parsed.
const FlagField = "log.flags"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if dissect should write it's flags into log.flags or rather event.flags? Reasons is that dissect is not only for logs but more generic.

Should have spotted this earlier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@webmat I think we need event.flags in the future in ECS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruflin Would it be the same for when an event is truncated?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did create elastic/ecs#100 a while ago for the log tag field. There is no issue yet for a more generic set of flags.

I agree with @ruflin that the dissect error should not be set on log.flags.

event.flags is a bit better. But I think this approach still mixes up pipeline & processing metadata with userland data (like the error discussion we had last week, @ruflin). The following idea hasn't been fleshed out yet, but I've been thinking we should introduce a section that's clearly about stuff that happened in the processing pipeline. E.g. pipeline.error, pipeline.tags (or flags), if someone wants to note down timings of each step in their pipeline, they'd do it under pipeline. as well, etc. However this will have to come after ECS 1.0/GA, so don't wait on this being defined for what needs to happen in Beats.

In the meantime, what I would suggest instead is to do what we've been doing for years, and add this dissect tag to tags directly, like Logstash does with _grok_parse_failure.

Copy link
Contributor

@webmat webmat Oct 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And @ph, to answer your more recent question, I would consider the truncation to be userland information, about the log itself. So I do think having truncated right on log.flags makes sense.

This is the new field where the multiline tag is also being added, correct? (Sorry I haven't been following these developments very closely)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's the same field correct.

Copy link
Contributor

@webmat webmat Oct 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks for confirming. So my opinion for now is that flags that are descriptive of the log itself or the log entry should be added to log.flags, so multiline, truncated, as they are now.

Parsing flags like dissect_parsing_error, on the other hand, should be added to tags, until we define a more general place to put pipeline errors and details.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To not block this PR, lets go with log.flags for now. Lets open a more general discussion where information from processing should go.

For tags in LS: We should probably also tackle this.


// Event is the common event format shared by all beats.
// Every event must have a timestamp and provide encodable Fields in `Fields`.
// The `Meta`-fields can be used to pass additional meta-data to the outputs.
Expand Down
10 changes: 10 additions & 0 deletions libbeat/processors/dissect/processor.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ import (
"github.com/elastic/beats/libbeat/processors"
)

const flagParsingError = "dissect_parsing_error"

type processor struct {
config config
}
Expand Down Expand Up @@ -60,6 +62,14 @@ func (p *processor) Run(event *beat.Event) (*beat.Event, error) {

m, err := p.config.Tokenizer.Dissect(s)
if err != nil {
if err := common.AddTagsWithKey(
event.Fields,
beat.FlagField,
[]string{flagParsingError},
); err != nil {
return event, errors.Wrap(err, "cannot add new flag the event")
}

return event, err
}

Expand Down
56 changes: 56 additions & 0 deletions libbeat/processors/dissect/processor_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -176,3 +176,59 @@ func TestFieldAlreadyExist(t *testing.T) {
})
}
}

func TestErrorFlagging(t *testing.T) {
t.Run("when the parsing fails add a flag", func(t *testing.T) {
c, err := common.NewConfigFrom(map[string]interface{}{
"tokenizer": "%{ok} - %{notvalid}",
})

if !assert.NoError(t, err) {
return
}

processor, err := newProcessor(c)
if !assert.NoError(t, err) {
return
}

e := beat.Event{Fields: common.MapStr{"message": "hello world"}}
event, err := processor.Run(&e)

if !assert.Error(t, err) {
return
}

flags, err := event.GetValue(beat.FlagField)
if !assert.NoError(t, err) {
return
}

assert.Contains(t, flags, flagParsingError)
})

t.Run("when the parsing is succesful do not add a flag", func(t *testing.T) {
c, err := common.NewConfigFrom(map[string]interface{}{
"tokenizer": "%{ok} %{valid}",
})

if !assert.NoError(t, err) {
return
}

processor, err := newProcessor(c)
if !assert.NoError(t, err) {
return
}

e := beat.Event{Fields: common.MapStr{"message": "hello world"}}
event, err := processor.Run(&e)

if !assert.NoError(t, err) {
return
}

_, err = event.GetValue(beat.FlagField)
assert.Error(t, err)
})
}