Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inputs.ping - When one target url not responding it stops and dont ping the rest #6690

Closed
kawaiipantsu opened this issue Nov 20, 2019 · 4 comments · Fixed by #6743
Closed
Labels
bug unexpected problem or unintended behavior
Milestone

Comments

@kawaiipantsu
Copy link

kawaiipantsu commented Nov 20, 2019

Relevant telegraf.conf:

[[inputs.ping]]
   urls = [
       "target1",
       "target2",
       "target3",
       "target4",
       "target5",
       "target6",
       "target7",
   ]
   method = "exec"
   count = 2

System info:

# uname -a
Linux telegraf 4.14.138-rancher #1 SMP Sat Aug 10 11:25:46 UTC 2019 x86_64 GNU/Linux

# telegraf --version
Telegraf 1.12.5 (git: HEAD f09d4023)

Steps to reproduce:

Have a long list of targets (url) target 1 to 7.
Now pull target3 down and NS record, and telegraf ping will only update target1 and 2 in influxdb.
The rest will shown as down.

I have tested both with "method" set to native and exec.

Expected behavior:

Target 1 to 2 - up and metrics
Target 3 - down
Target 4 to 7 - up and metrics

Actual behavior:

Target 1 to 2 - up and metrics
Target 3 - down
Target 4 to 7 - down

Additional info:

Also it's irritating that when a host is not responding that we don't set package loss to 100%.
Just a side note :)

@danielnelson danielnelson added the bug unexpected problem or unintended behavior label Dec 3, 2019
@danielnelson danielnelson added this to the 1.12.7 milestone Dec 3, 2019
@danielnelson
Copy link
Contributor

Fixing this for 1.12.7.

Unrelated to your bug, could you try switching to method = "native"? Our long term plan is to remove the exec method, once native has proven itself. On Linux you will need to modify some permissions to make it work.

Also it's irritating that when a host is not responding that we don't set package loss to 100%.

I'm assuming you mean when there is a dns lookup error? This has been requested before but since we aren't sending any ping packets when this error happens I didn't want to set this value. Maybe it would make sense to have a processor that can set a default value if a field is unset.

@danielnelson danielnelson modified the milestones: 1.12.7, 1.13.0 Dec 3, 2019
@kawaiipantsu
Copy link
Author

I tested your commit and yes it works as it should. It now continues on dns error. Also nice you changed the tag, somehow "url" just seemed misleading.

@kawaiipantsu
Copy link
Author

Fixing this for 1.12.7.

Unrelated to your bug, could you try switching to method = "native"? Our long term plan is to remove the exec method, once native has proven itself. On Linux you will need to modify some permissions to make it work.

Also it's irritating that when a host is not responding that we don't set package loss to 100%.

I'm assuming you mean when there is a dns lookup error? This has been requested before but since we aren't sending any ping packets when this error happens I didn't want to set this value. Maybe it would make sense to have a processor that can set a default value if a field is unset.

Hey, i have not had time to test it with native sadly, also we had to get it working on that day I filed the issue so I ended up coding a daemon in php running fling and doing the same. Perhaps we shift back to telegraf but for now this is working perfectly also expanded it to look at url's and http return codes.

Yeah the default value would properly be a good idea, like "what to return if all fails" but for what metric. It could also just be that if all fails you at least set package loss to 100%. This is what I have done in our own daemon as we see/use that indicator for way to express if a host is down or unreachable to..

@danielnelson
Copy link
Contributor

Also nice you changed the tag, somehow "url" just seemed misleading.

Just to avoid confusion, I didn't do this. I agree that url is wrong but I left it for now to keep backwards compatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants