-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix nagios parser to parse all perfdata and report info message. #5601
Conversation
@glinton, could you review please? |
Hi @danielnelson , @glinton is there any update on this? Is it being reviewed or any ETA? Regards, Sergey. |
I'll try to do a review later today |
plugins/inputs/exec/exec.go
Outdated
if err != nil { | ||
return nil, fmt.Errorf("exec: get exit code: %s", err) | ||
} | ||
nagios.AppendExitCode(&out, exitCode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I like injecting the return code into the parser data, is there a reason this needs to be changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the state
, service_output
, long_service_output
semantically belong together and should be a part of the same measurement.
The current parser interface doesn't allow to pass any extra information to the parser, whilst the service_output
and the long_service_output
are known during the parsing time only.
The parser will properly parse bytes for which the nagios.AppendExitCode(...)
wasn't called, it will assume the default state to be 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not my primary concern but wouldn't the state be set incorrectly depending on how the output ends? Perhaps we could add the state to the telegraf.Metric
after it has been returned from the parser?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Because we encode the state in the
exec
plugin, no matter how the output ends, the parser will always properly decode the state and strip those additional bytes.There might be an issue, when the parser is used somewhere else. And it is my bad, because I did not properly document the behavior of the
Parse
method. I will add the docs, but first, could you please comment on the next point: -
Appending the error code after we return from the parser is definitely possible. But will require me to do something like this:
// To return the exit code through the error handling mechanism.
//
type exitCode int
func (c exitCode) Error() string {}
// The callee changes
//
func (c CommandRunner) Run(
e *Exec,
command string,
acc telegraf.Accumulator,
) ([]byte, error) {
<...irrelevant...>
out = <output from the check>
if isNagiosParser {
exitCode, err := nagios.GetExitCode(nagiosErr)
if err != nil {
return nil, fmt.Errorf("exec: get exit code: %s", err)
}
return out.Bytes(), exitCode(exitCode)
}
return out.Bytes(), nil
}
// The caller changes
//
func (e *Exec) ProcessCommand(command string, acc telegraf.Accumulator, wg *sync.WaitGroup) {
defer wg.Done()
out, err := e.runner.Run(e, command, acc)
var exitCode int
if err != nil {
if v, ok := err.(exitCode); ok {
exitCode = int(v)
} else {
acc.AddError(err)
return
}
}
metrics, err := e.parser.Parse(out)
if err != nil {
acc.AddError(err)
return
}
if _, ok := e.parser.(*nagios.NagiosParser); ok {
// Find the metric with name == "nagios_state" and append the "state" field to it.
// Can be a convention that it is the last metric, or iterate through all and find it.
m := <somehow find the metric>
m.AddField("state", exitCode)
}
for _, metric := range metrics {
acc.AddFields(metric.Name(), metric.Fields(), metric.Tags(), metric.Time())
}
}
- I am relying on your expertise here. Which approach seem to be better for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danielnelson, what is your take on this ^^?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be sure, neither solution is great, and if I could do it over then I would probably make nagios it's own plugin separate from exec and scrap the standalone parser.
Putting that aside though, since I don't think it is worth a breaking change... I would parse, then if this is the NagiosParser, get the error code and iterate over the metrics looking for the nagios_state
metric using metric.Name()
. You will also want to watch for the case where the state metric is not found, and add the field as a new metric.
One more thing, could you do me a favor and use this form for adding the metrics?
for _, metric := range metrics {
acc.AddMetric(metric)
}
Hi @danielnelson, commented and made some changes. Do you think it is good to go now? |
Hi @danielnelson, addressed your comments. One of the test builds fails though, it seems to be something finicky with the require library. Have you seen something similar before? There is a new version of it, I'll check if they have related changes there. |
I think the tests fails because the ordering in a map in Go is unspecified, so they can't be safely compared, try using the helper functions in |
Fixed the tests. But this is very weird. Golang's Please let me know what you think of the changes, @danielnelson. |
Looks great, thanks! |
Required for all PRs:
Associated README.md updated.Not needed.Example Nagios check output:
Old behavior:
New behavior:
Changes:
perfdata
in{/home, /var/log}
added.Fixes #5591.