Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nagios parser returns not supported return codes and not enough information #11061

Closed
Sakerdotes opened this issue May 4, 2022 · 0 comments · Fixed by #11062
Closed

Nagios parser returns not supported return codes and not enough information #11061

Sakerdotes opened this issue May 4, 2022 · 0 comments · Fixed by #11062
Labels
area/exec bug unexpected problem or unintended behavior

Comments

@Sakerdotes
Copy link
Contributor

Sakerdotes commented May 4, 2022

Relevant telegraf.conf

[[inputs.exec]]
commands = ['/opt/checks/nagios_check.pl']
interval = '30s'
timeout = '15s'
name_suffix = '__nagios_check'
data_format = 'nagios'

Logs from Telegraf

2022-05-03T11:55:04Z E! [inputs.exec] Failed to add nagios state: exec: get exit code: exit status 127

System info

Telegraf 1.22.3, Oracle Linux Server 8.5

Docker

No response

Steps to reproduce

  1. Remove dependencies a nagios check needs to be executes (depends on check)
  2. Let telegraf execute the check

Expected behavior

Metric 1: => nagios_state__tcp,host=home1 service_output="/some/path/telegraf/commands/check_tcp: error while loading shared libraries: libssl.so.10: cannot open shared object file: No such file or directory",state=3i 1651661042000000000

Metric 2: => nagios_state__tcp,host=home1 service_output="fork/exec /some/path/telegraf/commands/check_tcp: permission denied",state=3i 1651665933000000000

Actual behavior

Metric 1: => nagios_state__tcp,host=home1 service_output="",state=127i 1651661431000000000
Metric 2: => nagios_state__tcp,host=home1 service_output="" 1651665481000000000

Additional info

1. Return of not supported status codes:

As seen in the 'actual behavior' the status code '127' is returned. This is a not supported status code and should be converted into a '3' unknown status. (https://nagios-plugins.org/doc/guidelines.html , Plugin Return Codes )

2. Missing error information:

The error message from this check execution is nowhere to be found. Not in the error.log file nor in service_output field from the metric. The message gets thrown away in the parsing process if it is not from type ExitError.

3. Error can only be found in error.log

The error can only be found in the telegraf error log when it occures. It should be shown in the service_output response itself.

Solution

The Solution would be to return a 'unknown' state in case of an error and to put the error message into the service output field. No errors needs to be logged into the telegraf error.log because a unknown state with proper information is a valid checkresult.

@Sakerdotes Sakerdotes added the bug unexpected problem or unintended behavior label May 4, 2022
Sakerdotes pushed a commit to SectorNordAG/telegraf that referenced this issue May 4, 2022
Sakerdotes pushed a commit to SectorNordAG/telegraf that referenced this issue May 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/exec bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant