-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
panic: unexpected fault address 0x0 #1424
Comments
Hi, we had two more cases of this "unexpected fault address 0x0" crash
The former one was on the same box as the first one reported. The latter was on a different box. Assuming it is the same issue, this rules out HW problems. One host is running 0.13.1, the other one is running 1.0.0-beta2. I haven't seen this type of crash on 0.13.0, so I suspect, it was introduced in 0.13.1. |
@zarnovican do you see any other messages in your logs about errors? and are you able to provide your configuration & any example metrics generated by exec? |
Hi, @sparrc
No, there is nothing except the exception and plenty of stack traces.. I don't know enough Go to tell which one caused it, so I always report the first one. Here is the full output.
Config is split into several files, here is the concatenated config:
BTW, I had to run
The |
It's possible that this is related to #1432, which has been fixed in version 1.0.0-beta3, do you think you could try that build and see if it resolves the issue? |
@sparrc Both hosts were updated to 1.0.0-beta3. I'll comment this ticket if it will crash again. |
thank you @zarnovican! |
It crashed again (on 1.0.0-beta3). This time, there was only one stack trace in the log. And the message is different, but it may be still the same problem.
(the error from redis input is unrelated, and it is fixed already AFAIK) @sparrc Is there anything that could be done to help you pinpoint the root-cause ? Like, core dump, or some debug mode ? |
That is a very peculiar panic....If I could provide you with a binary with the race-detector builtin, would you be able to use it? Important to note that this binary would perform a bit slower. |
@sparrc no problem. Alternatively, I can build it myself if you would give me instructions. |
@zarnovican do you have a Go environment? if so it's just these three steps:
|
@sparrc "race" build is deployed on two hosts. It seems it is working..
BTW, I have built it on go 1.5.x, if it makes any difference (probably does).
|
@butitsnotme perhaps are crash related. @zarnovican are you using exec? |
FYI The issue appeared in Telegraf 1.0.0 as well.
Full log https://gist.github.com/zarnovican/cfa366d167cf0f2c537b55d61c1c0bfb#file-crash-2016-09-29 |
@zarnovican it looks like the panic this time occured in the influxdb input plugin. I took a look at the code where it panicked and the only thing I can imagine happening here is memory corruption on your machine. The reason being that the panic is occuring during a JSON decode into an interface, but the interface is defined immediately before (there is no risk of a race condition), and the JSON itself is checked as being valid JSON as well. Do you have any other thoughts on this? Have you seen any other odd memory behavior on these machines? are you running any sort of peculiar workload on these machines? |
@sparrc thanks for your time looking into this.
As I have mentioned before, this has happened on two machines running on 0.13.1 at that time. It is possible that AWS has memory issue on both of them.. Here is a full log from the other machine (we call hoover) back from July https://gist.github.com/zarnovican/cfa366d167cf0f2c537b55d61c1c0bfb#file-hoover-crash-2016-07-17 So far, it has happened only once on 1.0.0.
The host where it is happening more frequently is running Icinga2+Postgres, InfluxDB+Grafana, Sentry+Postgres+Redis. All services are related to monitoring. It is running on AWS m1.small. It has only one CPU core, so it is utilized quite heavily. We don't have any memory-related issues with any other process on the same host. What those two machines have in common, is that they are both paravirtual (both m1.small). What I will do is to update Telegraf on all my hosts to see if it will appear somewhere else. |
@zarnovican -- are you still experiencing this issue with either 1.1 editions or with the 1.2 editions of the stack? Please let us know. |
@timhallinflux no, I haven't seen it for couple of months, I believe. On the other hand, I'm restarting telegraf daily now (for unrelated reasons), which would workaround the problem anyway. Closing.. |
Bug report
Telegraf crashed on me with
unexpected fault address 0x0
without any obvious outside factor.Relevant telegraf.conf:
Output goes to InfluxDB (0.13). There are handful of standard telegraf plugins, plus one exec plugin in multiple copies.
System info:
Telegraf: 0.13.1
OS: Ubuntu Trusty 14.04.4
Steps to reproduce:
Unknown. :(
Additional info:
We are running a mix of 0.13.0, 0.13.1 versions on multiple AWS instances with very similar configuration. This is the first such case of Telegraf panic.
The text was updated successfully, but these errors were encountered: