Monitoring API for Logstash Forwarder #996

c33s · 2016-02-19T03:27:10Z

migrated issue from logstash-forwarder elastic/logstash-forwarder#183

We will add an API for filebeat for monitoring

i wasn't able to find any docs about an api/monitoring/status options. is or will there be an api?
is there currently any kind of status/monitoring option?

kkirsche · 2016-02-20T01:01:03Z

When you say monitoring, do you mean something like nagios / monit but as a beat?

c33s · 2016-02-21T16:10:45Z

yes something like nagios/monit/whatever just something to verify the filebeat is shipping (as requested in the other issue).

ruflin · 2016-02-22T05:39:19Z

There could be several approaches to this:

We are working on monitoring of beats: Beats central monitoring via Elasticsearch - Phase 1 #463
Using Metricbeat (under development) and read out the httpprof stats: Metricbeat #619
Using the community nagioscheckbeat: https://github.com/PhaedrusTheGreek/nagioscheckbeat

tsg · 2016-02-24T17:31:36Z

@c33s You can also run with something like -httpprof :6060 flag and then curl http://localhost:6060/debug/vars

You'll get some metrics about the filebeat internals.

tsg · 2016-02-24T17:32:12Z

Closing as question.

c33s · 2016-02-24T20:57:46Z

please reopen, it is not a question, it is an issue opened on the old repo where @jordansissel said to migrate it to the new repo and maybe a request for documentation enhancement.

as far as i understand @ruflin the first solution is to create a monitoring system with filebeat and elasticsearch. so fb & es ARE the monitoring tools

metricbeat also look for other services to monitor them.

nagioscheckbeat also looks like for monitoring other services but not filebeat itself

all three solutions are not really what i am looking for, i am looking for a simple status command integration and a config value as threshold.

config in filebeat:

max_connection_interval: 5min

command

> filebeat status
ok

results can be "ok" or "error"

if the last ACK from the targetserver of filebeat is in the configured 5min interval, this command results in an "ok" if 5minutes or more it results in "error"

so everybody can easily add the monitoring tool of his choice to filebeat to ensure filebeat is sending data to its target server.

@tsg don't think it is a good thing to activate profiling from the performance perspective, should ther be a simple filebeat status option?

like apache status or the nginx status module http://nginx.org/en/docs/http/ngx_http_status_module.html

at least a step by step info how to monitor filebeat with nagois should be in the docs.

jordansissel · 2016-02-24T21:42:37Z

@c33s My feeling is that #463 will solve your concerns. I'm not sure the filebeat status you propose is an effective way to communicate health. Forr example, a stalled transmission is not necessarily an "error" as I view it, and your proposal of having this report "error" feels a bit wonky. That said, #463 could provide what you're asking for (you'd query Elasticsearch for this result, but your command line tool could be run as filebeat status if you wish to make such a thing and your tool could output "error" based on whatever conditions your business determines to be an error)

tsg · 2016-02-24T21:56:10Z

@c33s -httpprof doesn't have a performance penalty by itself, it's more that someone with access to that API can use it to enable profiling. If you make sure you bind it to localhost then having a nagios check on the same machine using that API seems like a pretty good solution to me.

The API gives you metrics, but not an overall "OK" status for the reasons that @jordansissel mentioned.

c33s · 2016-05-16T23:08:28Z

-httprof works for me but i think it should be directily implemented in filebeat.

so calling

filebeat --health

should return a similar result like curl localhost:6060/debug/vars but maybe reduced to the necessary keys.

i am currently monitoring the following values (but maybe other values are also good for health checking filebeat):

"libbeatEsPublishedButNotAckedEvents": 0,
"libbeatLogstashPublishedButNotAckedEvents": 0,
"libbeatMessagesDropped": 0,

i agree that it is not as simple as filebeat status -> OK, because i have to define the thresholds when it is an error but i think it is important, that i don't have to edit the init script of a package, adding a profiling flag just to see if my filebeat is delivering.

it should at least be possible to add the behavior of -httprof to the config yaml file. maybe also define there the thresholds for crit and warn, so a filebeat status would be able to return OK, WARNING or CRITICAL

rclmenezes · 2016-12-21T19:52:41Z

+1 to having a health REST API. I hear Logstash just got one in v5.0.0 at localhost:9600.

Right now, we use Nagios NRPE to remote execute health checks on different boxes. To properly check that Filebeat is healthy and shipping, we have to start the Filebeat service with -httpprof. That's not an optimal solution because:

A) We've been told before that the variables in /debug/vars are not stable and may change from time to time.

B) It's a pain in the butt to start Filebeat with the -httpprof option! We use Filebeat as a service and -httpprof is not an option in filebeat.yml, So in Ubuntu 16 we have to:

Add DAEMON_ARGS="-c /etc/filebeat/filebeat.yml -path.home /usr/share/filebeat -path.config /etc/filebeat -path.data /var/lib/filebeat -path.logs /var/log/filebeat -httpprof localhost:6060 to /etc/default/filebeat.
Remove the existing service: $ sudo rm /lib/systemd/system/filebeat.service
Reload our daemon: $ sudo systemctl daemon-reload
Restart the filebeat service: $ sudo service filebeat restart

Woof. So a proper health REST API would make our lives a lot easier :)

Thanks!

blalor · 2017-03-02T19:35:50Z

This is mandatory, in my opinion. Right now you have no idea what filebeat is actually doing, or if it's doing anything. log-courier got this right a long time ago.

kkirsche · 2017-03-02T21:51:29Z

In what kind of way should this monitoring API work? Potentially we want to work on adding prometheus monitoring support or would that be too "heavy"?

ruflin · 2017-03-03T07:25:20Z

We are planning to expose the expvar metrics through a separate http endpoint so not the whole httpprof has to be run. The data structure will be in json format as that is what we also use internally.

c33s · 2017-03-07T15:17:19Z

@kkirsche prometheus sounds to specific for me. the main requirements for me are:

adding most of the command line parameters to the config file
seperate config for api/status like status_enabled: true in the config file instead of -httprof
basic rules to be configured in the config file to allow a simple call of filebeat status -> OK, WARNING, CRITICAL (it is cool to have a json file but basic status should be supplied out of the box) also see Monitoring API for Logstash Forwarder #996 (comment)
api with json result is cool but should not replace a simple status call

ruflin · 2017-03-11T16:18:31Z

I started here a PR for more details discussion on this: #3693

@c33s To your points

I think so far we have added most of the cmd line params to the config. Anything specific missing?
See PR
Interesting idea to have something that can be checked on the command line. I'm thinking if it should potentially return a little bit more then just three different values as I don't want people to have to configure thresholds etc. I'm more in favor to give people the data and they have to decide on their own if that is good or bad for their environment. But definitively worth to dig deeper into this.
Would it be fine for you if you status call would also return json?

cawoodm · 2018-11-20T10:24:16Z

See https://www.elastic.co/guide/en/logstash/current/monitoring.html

botelastic · 2020-07-08T23:17:57Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

botelastic · 2020-07-08T23:18:48Z

This issue doesn't have a Team:<team> label.

tsg added the question label Feb 24, 2016

tsg closed this as completed Feb 24, 2016

tsg added enhancement discuss Issue needs further discussion. and removed question labels Feb 24, 2016

tsg reopened this Feb 24, 2016

c33s mentioned this issue May 17, 2016

Ping Logstash health? elastic/logstash#3782

Closed

botelastic bot added Stalled needs_team Indicates that the issue/PR needs a Team:* label labels Jul 8, 2020

botelastic bot closed this as completed Aug 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Monitoring API for Logstash Forwarder #996

Monitoring API for Logstash Forwarder #996

c33s commented Feb 19, 2016

kkirsche commented Feb 20, 2016

c33s commented Feb 21, 2016

ruflin commented Feb 22, 2016

tsg commented Feb 24, 2016

tsg commented Feb 24, 2016

c33s commented Feb 24, 2016

jordansissel commented Feb 24, 2016

tsg commented Feb 24, 2016

c33s commented May 16, 2016

rclmenezes commented Dec 21, 2016 •

edited

Loading

blalor commented Mar 2, 2017

kkirsche commented Mar 2, 2017

ruflin commented Mar 3, 2017

c33s commented Mar 7, 2017

ruflin commented Mar 11, 2017

cawoodm commented Nov 20, 2018

botelastic bot commented Jul 8, 2020

botelastic bot commented Jul 8, 2020

Monitoring API for Logstash Forwarder #996

Monitoring API for Logstash Forwarder #996

Comments

c33s commented Feb 19, 2016

kkirsche commented Feb 20, 2016

c33s commented Feb 21, 2016

ruflin commented Feb 22, 2016

tsg commented Feb 24, 2016

tsg commented Feb 24, 2016

c33s commented Feb 24, 2016

jordansissel commented Feb 24, 2016

tsg commented Feb 24, 2016

c33s commented May 16, 2016

rclmenezes commented Dec 21, 2016 • edited Loading

blalor commented Mar 2, 2017

kkirsche commented Mar 2, 2017

ruflin commented Mar 3, 2017

c33s commented Mar 7, 2017

ruflin commented Mar 11, 2017

cawoodm commented Nov 20, 2018

botelastic bot commented Jul 8, 2020

botelastic bot commented Jul 8, 2020

rclmenezes commented Dec 21, 2016 •

edited

Loading