Stats: Report config reload data #5680

jordansissel · 2016-07-22T18:07:47Z

Report config reload success count
Config reload failure count
Time of most recent config reload success
Time of most recent config reload failure

Suggested by @widhalmt in NETWAYS/check_logstash#6

jordansissel · 2016-08-01T17:09:13Z

For tracking reloads, we'll have to change the way the entire metrics tree is reset when a config is reloaded.

acchen97 · 2016-08-04T20:24:00Z

Thanks for tracking this. I can see this as a new resource type under /_node/stats/{x}.

We'll need to document this new behavior of metrics living across config reloads.

suyograo · 2016-08-16T18:44:46Z

@jordansissel assigning this to @jsvd

acchen97 · 2016-08-16T20:30:05Z

@jsvd FYI added this as the last item under "Node Stats" here #5732

jsvd · 2016-08-29T14:27:44Z

what about:

/_node/stats/pipeline/reloads/successful - number of successful pipeline reload /_node/stats/pipeline/reloads/failed - number of failed pipeline reloads (either from failing to fetch the configuration, to failing to create the new pipeline, or failure to start it)
/_node/stats/pipeline/reloads/last_success_time - date of the last successful reload /_node/stats/pipeline/reloads/last_failure_time - date of the last failed reload

[edit]

/_node/stats/pipeline/reloads/last_failure_message - exception message + backtrace(?) of the last failure to reload

ph · 2016-08-29T14:55:09Z

@jsvd Seems good to me, I see that you record the last_failure_time, I wonder if we should keep track of what was the last error?

/_node/stats/pipeline/reloads/failed

This node could be detailed, I see values when config management get in.

jsvd · 2016-08-29T14:58:06Z

yes I thought about that as well,I updated the issue description to add this metric, can we have a kind of metric gauge which is a string?

jsvd · 2016-08-30T11:20:35Z

@ph the resource /_node/stats/pipeline/ is expected next to list the pipeline ids correct? e.g. /_node/stats/pipeline/main/?

If so we can't use /_node/stats/pipeline/reloads/ as that would seems like the name of a pipeline

Also, pipeline reload metrics could be done per pipeline id: a reload on the main pipeline would only increase the failed/succesful counts for that pipeline id. Once we have 2 or more pipelines in the same instance, having a global failed count is somewhat useless.

Implementing this means that the metric reset on reload must be even finer grained by only clearing the event counts/plugin stats for that pipeline id and not the whole pipeline resource.

Thoughts?

ph · 2016-08-30T12:35:09Z

yes I thought about that as well,I updated the issue description to add this metric, can we have a kind of metric gauge which is a string?

Perfectly possible, gauge can hold any types.

If so we can't use /_node/stats/pipeline/reloads/ as that would seems like the name of a pipeline

Also, pipeline reload metrics could be done per pipeline id: a reload on the main pipeline would only increase the failed/succesful counts for that pipeline id. Once we have 2 or more pipelines in the same instance, having a global failed count is somewhat useless.
Implementing this means that the metric reset on reload must be even finer grained by only clearing the event counts/plugin stats for that pipeline id and not the whole pipeline resource.

This is good point, I am not sure of the best approch here for two reason.

In theory, we can have multiples pipeline coexists, this never have been the case other than the internal metric one, which doesn't gather metric..
When multiples pipelines land some of the API will also have to change possibly.

I wonder if we should have an /agent namespace that hold theses kind of values independently of any numbers of pipelines.

jsvd · 2016-08-30T13:30:56Z

well in this case we don't need the /agent thing, we just to do a deeper reset and put the reload count in the correct pipeline_id

ph · 2016-08-30T13:36:57Z

agree

acchen97 · 2016-08-31T07:35:46Z

As we only expose the main pipeline today, I think we can proceed with these stats under /_node/stats/pipeline/reloads, which is similar to the other events/plugin stats stuff. We can always correlate to more specific pipeline IDs later (beyond main).

@jsvd FYI in case you haven't seen this, but @jordansissel defined a metrics style guide that we should strive to adhere to. Also, for success/failure, I know for instance with grok/date stats, the terms "matches" and "failures" are used. Maybe "successes" and "failures" makes sense here?

jsvd · 2016-08-31T11:20:10Z

@acchen97 I wasn't aware of that, great feedback thanks!

jsvd · 2016-09-06T13:19:26Z

done in #5848

suyograo added enhancement monitoring v5.0.0-beta1 labels Jul 26, 2016

suyograo assigned jordansissel Jul 31, 2016

acchen97 mentioned this issue Aug 4, 2016

Monitoring API enhancements/cleanups for 5.0 #5732

Closed

7 tasks

acchen97 mentioned this issue Aug 12, 2016

Monitoring API feature backlog #5759

Closed

9 tasks

suyograo assigned jsvd and unassigned jordansissel Aug 16, 2016

jsvd mentioned this issue Aug 24, 2016

Stats: change stats reset operation to only clear destroyed pipelined counts #5818

Closed

jsvd mentioned this issue Aug 31, 2016

add metrics regarding pipeline reloading #5848

Closed

jsvd closed this as completed Sep 6, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stats: Report config reload data #5680

Stats: Report config reload data #5680

jordansissel commented Jul 22, 2016 •

edited by jsvd

Loading

jordansissel commented Aug 1, 2016

acchen97 commented Aug 4, 2016

suyograo commented Aug 16, 2016

acchen97 commented Aug 16, 2016

jsvd commented Aug 29, 2016 •

edited

Loading

ph commented Aug 29, 2016

jsvd commented Aug 29, 2016

jsvd commented Aug 30, 2016

ph commented Aug 30, 2016

jsvd commented Aug 30, 2016 •

edited

Loading

ph commented Aug 30, 2016

acchen97 commented Aug 31, 2016

jsvd commented Aug 31, 2016

jsvd commented Sep 6, 2016

Stats: Report config reload data #5680

Stats: Report config reload data #5680

Comments

jordansissel commented Jul 22, 2016 • edited by jsvd Loading

jordansissel commented Aug 1, 2016

acchen97 commented Aug 4, 2016

suyograo commented Aug 16, 2016

acchen97 commented Aug 16, 2016

jsvd commented Aug 29, 2016 • edited Loading

ph commented Aug 29, 2016

jsvd commented Aug 29, 2016

jsvd commented Aug 30, 2016

ph commented Aug 30, 2016

jsvd commented Aug 30, 2016 • edited Loading

ph commented Aug 30, 2016

acchen97 commented Aug 31, 2016

jsvd commented Aug 31, 2016

jsvd commented Sep 6, 2016

jordansissel commented Jul 22, 2016 •

edited by jsvd

Loading

jsvd commented Aug 29, 2016 •

edited

Loading

jsvd commented Aug 30, 2016 •

edited

Loading