-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Report metrics from custom-plugin-monitor #315
Conversation
Hi @xueweiz. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @wangzhen127 |
/cc @andyxning |
/ok-to-test |
My bad. I'll fix the test. |
@Random-Liu Hi Lantao, sorry for the delay. I just fixed the tests. Could you help take a look? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with one comment.
glog.Errorf("Failed to update problem gauge metrics for problem %q, reason %q: %v", | ||
result.Rule.Condition, result.Rule.Reason, err) | ||
} | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generate counter and gauge at the end of the function based on the final events and condition.
The logic is too complex, the code makes it even harder to maintain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And let's do the same for system log monitor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah this is a very good idea! Thanks for noticing that1
Done.
event.Reason, err) | ||
} | ||
} | ||
for _, condition := range metricChangingConditions { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we just go through condition list, and set problem gauge for all conditions correspondingly?
That seems more reliable, basically a periodic gauge sync. Will that cause any performance issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh just realized that in custom-plugin-monitor, the generateStatus() is only called periodically. Then I think it's fine. Changing the code.
(in system-log-monitor, the generateStatus() is called for each matched Rule, which is much more frequent. So I'd prefer to keep the changedConditions
logic in there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG. The logic in system-log-monitor is simple, so I'm also fine with adding smarter logic there. :)
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Random-Liu, xueweiz The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Add a new field
metricsReporting
(true
by default) in thecustom-plugin-monitor
config file.By setting the
metricsReporting
field totrue
,custom-plugin-monitor
will report metrics when temporary/permanent problems happen.The behavior was specified in the NPD metrics design doc here.
For example, when I make the custom plugin script to report problem (by manually editing the script to force it return error), I will see:
This PR maintains backward compatibility on existing
custom-plugin-monitor
configs.This PR is part of #284:
Most of the heavy lifting has been done in #300 .
I tried to update the README for custom-plugin-monitor, but seems that's not necessary:
rules
,conditions
) were omitted.Since the
metricsReporting
flag is now a common config flag used by bothsystem-log-monitor
andcustom-plugin-monitor
, I don't think it's necessary to add it to the custom-plugin-monitor design doc.