-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random crash [inputs.ethtool] SIGSEGV: segmentation violation #11285
Comments
Interesting, is this a NAS device?
While the last message is about eth0, I do no believe the seg fault is from the ethtool plugin. The protoreflect library is used by the XPath parser. Could you please include the rest of your config? It would be good to identify which plugin is causing this to figure out what to do next. |
This is a Pogoplug Mobile The config is just the generic telegraf.conf from install, nothing fancy. Just tracking regular CPU, mem usage etc to a remote InfluxDB. As I have done with many linux servers before. The only 'difference' is this time I am doing it on an ARM processor with only 128MB of RAM... :/ I tried dpkg-reconfigure it but the result was the same. I have commented out some lines in /etc/sysctl.d/sysctl.conf for now (network related) so far has been running for about 10 hours without issues.
|
And your output is influxdb? As of right now, I am not sure what we can do. As a next step can you please try to narrow down which plugin is causing the crash? |
Correct. My only output is InfluxDB.
And every time it crashes, the last entry always seems to be "[inputs.ethtool] Error in plugin: eth0 driver: operation not permitted". Hence I am suspecting is something to do with the network, that's why I looked into sysctl.conf and commented out those network related overrides. |
ok
If this does ultimately fail, my request would be that you start reducing the number of input plugins, start with ethtool and see if it reproduces. As I said above, I am not certain what our next step is from the project's side. |
Darn, still crashed last night with the same error message. But this time it didn't crash right away.
|
Can you please grab The new stack is a different stack from the first crash. It is still not clear that this is specific to code in Telegraf and not either a) system related and/or b) related to go itself. Per golang/go#44096 there is a comment that "The call to runtime.addOneOpenDeferFrame in the traceback suggests corruption of the defer stack". Given you saw an improvement with I would suggest the following:
|
I don't have 'go' installed (?)
and 'uname -a'
I agree with you, this time the stack is different. And right now I turned off swap completely and let's see how long it last without crashing. As for the 'improvement' vm.swappiness is always 10. The only difference is I am back to the default values for anything that's network-related. |
Hello! I am closing this issue due to inactivity. I hope you were able to resolve your problem, if not please try posting this question in our Community Slack or Community Page. Thank you! |
I met the same problem. :( |
Relevant telegraf.conf
Logs from Telegraf
System info
Linux 5.13.6-kirkwood-tld-1 #1.0 PREEMPT Sat Jul 31 22:10:39 PDT 2021 armv5tel GNU/Linux // Telegraf 1.22.4 (git: HEAD acf6706)
Docker
No response
Steps to reproduce
Additional info
Expected behavior
It should not crash
Actual behavior
It crashes from within an hour to a few hours.
Additional info
In my example log above, it crashes at 21:40. Telegraf was started at 15:36.
Only occasionally a few warnings about "Collection took longer than expected; not complete after interval of 10s" And the last log entry before it crashed was 17:00... And there was nothing in between.
The text was updated successfully, but these errors were encountered: