-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
metricbeat: error calling MarshalJSON for type common.Float #3580
Comments
Looks like the issue was that some of the containers I had created were busted and were constantly restarting. This must have caused an error when metricbeat tried to read something. When I updated the containers to remove the busted ones I got the following in my logs:
But now things seem to be reporting naturally. I'm going to leave this open since the error reporting and inability to exit metricbeat probably characterizes this as a bug, but I'll wait for feedback. |
Interesting that it can happen that a container doesn't have a name. @spalger Which docker version are you using? For the long shutdown: If you have lots of containers, it can take quite a bit of time to shut down :-( Problem is that for each container we have to do a http request at the moment. So these can queue up :-( How many containers did you have? |
Docker version 1.13.1, build 092cba3, and I had 11 containers at that time I think. I couldn't help but think that the metrics module was frozen or something. It wasn't using any resources and the log kept reporting |
Looks like metricbeat has frozen up again, but this time just because a container crashed and was not instructed to restart on crash. Here are the series of events as I observed them:
console output: https://gist.github.com/spalger/a179d18dfc13b2a006a529634a6404e7 |
If anyone would like to take a look at this machine, just ping me offline with your public key |
@spalger Thanks for all the details. It seems metricbeat is hanging in a request to the docker stats API (my assumption). I need to find a good way how to crash a container to reproduce this. |
Btw, you can get a stacktrace from Metricbeat if you send it SIGQUIT. That might be helpful if it happens again. |
@ruflin looks like the container doesn't need to crash, a short lived container that exits with 0 status seems to have the same effect |
Not sure if FreeBSD jails count as containers to metricbeat, but I see this error a lot on my FreeNAS box:
In fact, it's the only error in any of the logs. |
@untergeek Does it still report any data? It seems the container return an interesting "JSON" structure. It would be interesting to get hold of the JSON data that is received to have a closer look. I'm thinking if this could be an enconding issue or that the server does not return json, even though it should |
I'm also having this issue:
Versions: docker version: Metricbeat is set to pull over But It's pulling over Is there anyway to get at the struct docker returns for inspection? |
@jess-lawrence-axomic You should be able to access the endpoint through |
Hey @ruflin , here it is:
It seems to have omitted memory stats altogether. This node runs on an EC2 instance (HVM) so I suspect that could be causing issues? |
Thanks for sharing the output. Very interesting. I'm curious what it means when the memory stats are empty. Perhaps @exekias or @douaejeouit know more? |
That probably means the container is not running at the moment, so no mem info is reported. I'm afraid we need some more tracing to detect what happened there, as I don't see any value starting with N (from |
Thanks for sharing. According to the given logs, the problem occurs when the container stops while fetching stats. When this happen, there are no stats to report. I think we should handle this case instead of logging warning. (docker.go : 130)
Concerning the omitted memory stats, it's probably not a bug because, logically, if that were the case, you wouldn't have had any output. The cmd would probably crush or report an error ? But in this case, you're having your container statistics except for the memory ones. That's weird! It looks more like the memory stats & the storage stats were voluntary omitted, or at least this is the impression the output gives. |
Good morning @douaejeouit @exekias The container was running when I retrieved the stats from the docker API, here is another container on a new node/host:
And the stats:
However I'm thinking this is more a node config issue as the memory cgroup isn't present:
|
Thank you for the quick reply. Yes, would suspect that! Unfortunately, I don't have a clear idea about how docker instance is managed on EC2 .. :( |
I think a first improvement step we could do on the docker metricsets is to check if the json is empty before we start processing it and return an more meaningful error instead of the getting to the JSON Marshall error. Not sure how tricky that will be with the docker library we use. |
Just a FYI, turns out debian disables memory cgroup by default, this solves it:
|
@jess-lawrence-axomic that sounds good! Thank you for sharing the information. |
The trick is to make the fields a pointers here to take advantage of the "omitempty" option declared for , basically, all the fields. If the field's tag is "omitempty" and it has an empty value, the field should be omitted from the encoding. Empty values are :
I played some tests where a declare an empty stats structure, and I'm getting this output, which, I think , it represents the default values of the declared fields.
output with pointer feilds :
@ruflin what do you think about it? |
That looks very promising. Instead of reporting the default values I would suggest to report and error. Something like |
Update : Actually, the
happens simply because of the value stored here . Since an "empty" stats is, basically, a zeroed allocated memory storage ( it's a pointer to a newly allocated zero value of the structure ), ie, fields are equal to 0 at the creation. This is what the new function does!
|
If a value is set (also 0) it will be reported. If we don't want to report it, we need to remove it from the event as otherwise it will be hard to tell, if it is 0 or accidentially 0 (as you pointed out). So if we don't get any json back, we should return an error and not any values I would think? |
Based on @douaejeouit's last comment I tried to create a quick fix to just send
Either I failed fixing the code (my first lines of go) or the I also thought if |
As the error here does not have a stack trace, its tricky to figure out where it comes from. Probably interesting could be what is before and ofter especially if you have debug enabled. |
This looks like the issue that was fixed by #11676 and backported to 6.7 and 7.0 |
What I think happened:
system
module todocker
module./metricbeat
but it wasn't responding toctrl+c
kill -9
metricsets
configERR Failed to encode event: json: error calling MarshalJSON for type common.Float: invalid character 'N' looking for beginning of value
I've tried for fix this by wiping out the metricbeat data directory and restarting the box but am unable to get it work anymore.
The text was updated successfully, but these errors were encountered: