Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitoring API for Logstash Forwarder #996

Closed
c33s opened this issue Feb 19, 2016 · 18 comments
Closed

Monitoring API for Logstash Forwarder #996

c33s opened this issue Feb 19, 2016 · 18 comments
Labels
discuss Issue needs further discussion. enhancement needs_team Indicates that the issue/PR needs a Team:* label Stalled

Comments

@c33s
Copy link

c33s commented Feb 19, 2016

migrated issue from logstash-forwarder elastic/logstash-forwarder#183

@jordansissel

We will add an API for filebeat for monitoring

i wasn't able to find any docs about an api/monitoring/status options. is or will there be an api?
is there currently any kind of status/monitoring option?

@kkirsche
Copy link
Contributor

When you say monitoring, do you mean something like nagios / monit but as a beat?

@c33s
Copy link
Author

c33s commented Feb 21, 2016

yes something like nagios/monit/whatever just something to verify the filebeat is shipping (as requested in the other issue).

@ruflin
Copy link
Member

ruflin commented Feb 22, 2016

There could be several approaches to this:

@tsg
Copy link
Contributor

tsg commented Feb 24, 2016

@c33s You can also run with something like -httpprof :6060 flag and then curl http://localhost:6060/debug/vars

You'll get some metrics about the filebeat internals.

@tsg tsg added the question label Feb 24, 2016
@tsg
Copy link
Contributor

tsg commented Feb 24, 2016

Closing as question.

@tsg tsg closed this as completed Feb 24, 2016
@c33s
Copy link
Author

c33s commented Feb 24, 2016

please reopen, it is not a question, it is an issue opened on the old repo where @jordansissel said to migrate it to the new repo and maybe a request for documentation enhancement.

as far as i understand @ruflin the first solution is to create a monitoring system with filebeat and elasticsearch. so fb & es ARE the monitoring tools

metricbeat also look for other services to monitor them.

nagioscheckbeat also looks like for monitoring other services but not filebeat itself

all three solutions are not really what i am looking for, i am looking for a simple status command integration and a config value as threshold.

config in filebeat:

max_connection_interval: 5min

command

> filebeat status
ok

results can be "ok" or "error"

if the last ACK from the targetserver of filebeat is in the configured 5min interval, this command results in an "ok" if 5minutes or more it results in "error"

so everybody can easily add the monitoring tool of his choice to filebeat to ensure filebeat is sending data to its target server.

@tsg don't think it is a good thing to activate profiling from the performance perspective, should ther be a simple filebeat status option?

like apache status or the nginx status module http://nginx.org/en/docs/http/ngx_http_status_module.html

at least a step by step info how to monitor filebeat with nagois should be in the docs.

@jordansissel
Copy link
Contributor

@c33s My feeling is that #463 will solve your concerns. I'm not sure the filebeat status you propose is an effective way to communicate health. Forr example, a stalled transmission is not necessarily an "error" as I view it, and your proposal of having this report "error" feels a bit wonky. That said, #463 could provide what you're asking for (you'd query Elasticsearch for this result, but your command line tool could be run as filebeat status if you wish to make such a thing and your tool could output "error" based on whatever conditions your business determines to be an error)

@tsg tsg added enhancement discuss Issue needs further discussion. and removed question labels Feb 24, 2016
@tsg tsg reopened this Feb 24, 2016
@tsg
Copy link
Contributor

tsg commented Feb 24, 2016

@c33s -httpprof doesn't have a performance penalty by itself, it's more that someone with access to that API can use it to enable profiling. If you make sure you bind it to localhost then having a nagios check on the same machine using that API seems like a pretty good solution to me.

The API gives you metrics, but not an overall "OK" status for the reasons that @jordansissel mentioned.

@c33s
Copy link
Author

c33s commented May 16, 2016

-httprof works for me but i think it should be directily implemented in filebeat.

so calling

filebeat --health

should return a similar result like curl localhost:6060/debug/vars but maybe reduced to the necessary keys.

i am currently monitoring the following values (but maybe other values are also good for health checking filebeat):

"libbeatEsPublishedButNotAckedEvents": 0,
"libbeatLogstashPublishedButNotAckedEvents": 0,
"libbeatMessagesDropped": 0,

i agree that it is not as simple as filebeat status -> OK, because i have to define the thresholds when it is an error but i think it is important, that i don't have to edit the init script of a package, adding a profiling flag just to see if my filebeat is delivering.

it should at least be possible to add the behavior of -httprof to the config yaml file. maybe also define there the thresholds for crit and warn, so a filebeat status would be able to return OK, WARNING or CRITICAL

@rclmenezes
Copy link

rclmenezes commented Dec 21, 2016

+1 to having a health REST API. I hear Logstash just got one in v5.0.0 at localhost:9600.

Right now, we use Nagios NRPE to remote execute health checks on different boxes. To properly check that Filebeat is healthy and shipping, we have to start the Filebeat service with -httpprof. That's not an optimal solution because:

A) We've been told before that the variables in /debug/vars are not stable and may change from time to time.

B) It's a pain in the butt to start Filebeat with the -httpprof option! We use Filebeat as a service and -httpprof is not an option in filebeat.yml, So in Ubuntu 16 we have to:

  • Add DAEMON_ARGS="-c /etc/filebeat/filebeat.yml -path.home /usr/share/filebeat -path.config /etc/filebeat -path.data /var/lib/filebeat -path.logs /var/log/filebeat -httpprof localhost:6060 to /etc/default/filebeat.

  • Remove the existing service: $ sudo rm /lib/systemd/system/filebeat.service

  • Reload our daemon: $ sudo systemctl daemon-reload

  • Restart the filebeat service: $ sudo service filebeat restart

Woof. So a proper health REST API would make our lives a lot easier :)

Thanks!

@blalor
Copy link

blalor commented Mar 2, 2017

This is mandatory, in my opinion. Right now you have no idea what filebeat is actually doing, or if it's doing anything. log-courier got this right a long time ago.

@kkirsche
Copy link
Contributor

kkirsche commented Mar 2, 2017

In what kind of way should this monitoring API work? Potentially we want to work on adding prometheus monitoring support or would that be too "heavy"?

@ruflin
Copy link
Member

ruflin commented Mar 3, 2017

We are planning to expose the expvar metrics through a separate http endpoint so not the whole httpprof has to be run. The data structure will be in json format as that is what we also use internally.

@c33s
Copy link
Author

c33s commented Mar 7, 2017

@kkirsche prometheus sounds to specific for me. the main requirements for me are:

  • adding most of the command line parameters to the config file
  • seperate config for api/status like status_enabled: true in the config file instead of -httprof
  • basic rules to be configured in the config file to allow a simple call of filebeat status -> OK, WARNING, CRITICAL (it is cool to have a json file but basic status should be supplied out of the box) also see Monitoring API for Logstash Forwarder #996 (comment)
  • api with json result is cool but should not replace a simple status call

@ruflin
Copy link
Member

ruflin commented Mar 11, 2017

I started here a PR for more details discussion on this: #3693

@c33s To your points

  • I think so far we have added most of the cmd line params to the config. Anything specific missing?
  • See PR
  • Interesting idea to have something that can be checked on the command line. I'm thinking if it should potentially return a little bit more then just three different values as I don't want people to have to configure thresholds etc. I'm more in favor to give people the data and they have to decide on their own if that is good or bad for their environment. But definitively worth to dig deeper into this.
  • Would it be fine for you if you status call would also return json?

@cawoodm
Copy link
Contributor

cawoodm commented Nov 20, 2018

See https://www.elastic.co/guide/en/logstash/current/monitoring.html

@botelastic
Copy link

botelastic bot commented Jul 8, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@botelastic botelastic bot added Stalled needs_team Indicates that the issue/PR needs a Team:* label labels Jul 8, 2020
@botelastic
Copy link

botelastic bot commented Jul 8, 2020

This issue doesn't have a Team:<team> label.

@botelastic botelastic bot closed this as completed Aug 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issue needs further discussion. enhancement needs_team Indicates that the issue/PR needs a Team:* label Stalled
Projects
None yet
Development

No branches or pull requests

8 participants