-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make HTTP request timeout configurable in prometheus checks #1790
Make HTTP request timeout configurable in prometheus checks #1790
Conversation
2b1c18c
to
0e74b60
Compare
7f4c3e9
to
0edfa1c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few things :)
@@ -622,3 +625,13 @@ def _submit_gauges_from_histogram(self, name, metric, send_histograms_buckets=Tr | |||
|
|||
def _is_value_valid(self, val): | |||
return not (isnan(val) or isinf(val)) | |||
|
|||
def set_prometheus_timeout(self, instance, default_value=10): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to validate the user's configuration? Requests will automatically trigger a Timeout
exception, alerting the user that they need to correct that setting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also possible for the user to set a negative timeout; consider the case when default_instance
has a negative timeout and the timeout is not set in the supplied instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If requests is handling the negative timeout for us, indeed, no big need to check it.
Removed it 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Just tested and I get a ValueError
if timeout
is less than 0.
Also, I think we can simplify it even further by doing something like this:
def set_prometheus_time(...):
self.prometheus_timeout = instance.get('prometheus_timeout', default_value)
@@ -1601,3 +1601,30 @@ def test_health_service_check_failing(): | |||
PrometheusCheck.CRITICAL, | |||
tags=["endpoint:http://fake.endpoint:10055/metrics"] | |||
) | |||
|
|||
def test_set_prometheus_timeout(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also be testing for when default_instance
has a negative timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Added some tests relative to that, both for PrometheusCheck
and GenericPrometheusCheck
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, since we're removing the validation part, we don't need to check for negative values, just that the same value that we set is the same as what we get.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I was a bit confused by your two remarks, so I did both ^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that we're not validating anymore which means we can clean up the tests 👍
@@ -622,3 +625,13 @@ def _submit_gauges_from_histogram(self, name, metric, send_histograms_buckets=Tr | |||
|
|||
def _is_value_valid(self, val): | |||
return not (isnan(val) or isinf(val)) | |||
|
|||
def set_prometheus_timeout(self, instance, default_value=10): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Just tested and I get a ValueError
if timeout
is less than 0.
Also, I think we can simplify it even further by doing something like this:
def set_prometheus_time(...):
self.prometheus_timeout = instance.get('prometheus_timeout', default_value)
@@ -1601,3 +1601,30 @@ def test_health_service_check_failing(): | |||
PrometheusCheck.CRITICAL, | |||
tags=["endpoint:http://fake.endpoint:10055/metrics"] | |||
) | |||
|
|||
def test_set_prometheus_timeout(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, since we're removing the validation part, we don't need to check for negative values, just that the same value that we set is the same as what we get.
'default_namespace': { | ||
'prometheus_url': endpoint, | ||
'metrics': [{"test_rate": "test.rate"}], | ||
'prometheus_timeout': -1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make it a positive value just to make it cleaner 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback! 👍
@@ -159,12 +159,14 @@ def get_scraper(self, instance): | |||
scraper.extra_headers.update(instance.get("extra_headers", {})) | |||
# For simple values instance settings overrides optional defaults | |||
scraper.prometheus_metrics_prefix = instance.get("prometheus_metrics_prefix", default_instance.get("prometheus_metrics_prefix", '')) | |||
scraper.label_to_hostname = instance.get("label_to_hostname", default_instance.get("prometheus_url", "")) | |||
scraper.label_to_hostname = instance.get("label_to_hostname", default_instance.get("label_to_hostname", "")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should fix the tests so we actually catch these issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good practice indeed! I added it, but leaving the other config params untested. Let me know if you think I should add them in the same act :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM :)
* Prometheus checks: Adding prometheus_timeout options * Adding timeout to GenericPrometheusCheck integrations * better code share for timeout definition * Even better * Adding tests * Adding timeout option to some other PrometheusChecks * Removed changes to gitlab integration * removing gitlab_runner changes * Removing validation for timeout values, leaving them to requests * Adding some other tests for negative values + GenericPrometheusCheck * Simplify logic * Do not check for negative values, they would be set and handled by requests anyway * Adding test for label_to_hostname default instance + changing default value
* Prometheus checks: Adding prometheus_timeout options * Adding timeout to GenericPrometheusCheck integrations * better code share for timeout definition * Even better * Adding tests * Adding timeout option to some other PrometheusChecks * Removed changes to gitlab integration * removing gitlab_runner changes * Removing validation for timeout values, leaving them to requests * Adding some other tests for negative values + GenericPrometheusCheck * Simplify logic * Do not check for negative values, they would be set and handled by requests anyway * Adding test for label_to_hostname default instance + changing default value
What does this PR do?
Adding a
prometheus_timeout
option available in all prometheus checks, so that it can be set by integration and changed easily by the user.Motivation
Issues with some checks taking too long to execute (e.g. kubernetes_state)
Review checklist
no-changelog
label attachedAdditional Notes
Leaving kubelet, gitlab, gitlab_runner and istio integrations aside for now (at least)