Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make HTTP request timeout configurable in prometheus checks #1790

Merged

Conversation

antoinepouille
Copy link
Contributor

@antoinepouille antoinepouille commented Jun 22, 2018

What does this PR do?

Adding a prometheus_timeout option available in all prometheus checks, so that it can be set by integration and changed easily by the user.

Motivation

Issues with some checks taking too long to execute (e.g. kubernetes_state)

Review checklist

  • PR has a meaningful title or PR has the no-changelog label attached
  • Feature or bugfix has tests
  • Git history is clean
  • If PR impacts documentation, docs team has been notified or an issue has been opened on the documentation repo

Additional Notes

Leaving kubelet, gitlab, gitlab_runner and istio integrations aside for now (at least)

@antoinepouille antoinepouille requested review from a team as code owners June 22, 2018 13:17
@antoinepouille antoinepouille force-pushed the antoine/make-prometheus-check-timeout-configurable branch 4 times, most recently from 2b1c18c to 0e74b60 Compare June 22, 2018 14:46
@antoinepouille antoinepouille added this to the 6.3.1 milestone Jun 22, 2018
@masci masci changed the title [prometheus check] make request timeout configurable Make HTTP request timeout configurable Jun 24, 2018
@JulienBalestra JulienBalestra modified the milestones: 6.3.1, 6.4 Jun 25, 2018
@antoinepouille antoinepouille changed the title Make HTTP request timeout configurable Make HTTP request timeout configurable in prometheus checks Jun 25, 2018
@masci masci removed this from the 6.4 milestone Jun 25, 2018
@antoinepouille antoinepouille force-pushed the antoine/make-prometheus-check-timeout-configurable branch from 7f4c3e9 to 0edfa1c Compare June 26, 2018 15:49
@rishabh rishabh self-assigned this Jun 29, 2018
Copy link

@rishabh rishabh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few things :)

@@ -622,3 +625,13 @@ def _submit_gauges_from_histogram(self, name, metric, send_histograms_buckets=Tr

def _is_value_valid(self, val):
return not (isnan(val) or isinf(val))

def set_prometheus_timeout(self, instance, default_value=10):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to validate the user's configuration? Requests will automatically trigger a Timeout exception, alerting the user that they need to correct that setting.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also possible for the user to set a negative timeout; consider the case when default_instance has a negative timeout and the timeout is not set in the supplied instance.

Copy link
Contributor Author

@antoinepouille antoinepouille Jul 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If requests is handling the negative timeout for us, indeed, no big need to check it.
Removed it 👍

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
Just tested and I get a ValueError if timeout is less than 0.

Also, I think we can simplify it even further by doing something like this:

def set_prometheus_time(...):
    self.prometheus_timeout = instance.get('prometheus_timeout', default_value)

@@ -1601,3 +1601,30 @@ def test_health_service_check_failing():
PrometheusCheck.CRITICAL,
tags=["endpoint:http://fake.endpoint:10055/metrics"]
)

def test_set_prometheus_timeout():
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also be testing for when default_instance has a negative timeout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Added some tests relative to that, both for PrometheusCheck and GenericPrometheusCheck

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, since we're removing the validation part, we don't need to check for negative values, just that the same value that we set is the same as what we get.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I was a bit confused by your two remarks, so I did both ^^

Copy link

@rishabh rishabh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that we're not validating anymore which means we can clean up the tests 👍

@@ -622,3 +625,13 @@ def _submit_gauges_from_histogram(self, name, metric, send_histograms_buckets=Tr

def _is_value_valid(self, val):
return not (isnan(val) or isinf(val))

def set_prometheus_timeout(self, instance, default_value=10):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
Just tested and I get a ValueError if timeout is less than 0.

Also, I think we can simplify it even further by doing something like this:

def set_prometheus_time(...):
    self.prometheus_timeout = instance.get('prometheus_timeout', default_value)

@@ -1601,3 +1601,30 @@ def test_health_service_check_failing():
PrometheusCheck.CRITICAL,
tags=["endpoint:http://fake.endpoint:10055/metrics"]
)

def test_set_prometheus_timeout():
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, since we're removing the validation part, we don't need to check for negative values, just that the same value that we set is the same as what we get.

'default_namespace': {
'prometheus_url': endpoint,
'metrics': [{"test_rate": "test.rate"}],
'prometheus_timeout': -1,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make it a positive value just to make it cleaner 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! 👍

@@ -159,12 +159,14 @@ def get_scraper(self, instance):
scraper.extra_headers.update(instance.get("extra_headers", {}))
# For simple values instance settings overrides optional defaults
scraper.prometheus_metrics_prefix = instance.get("prometheus_metrics_prefix", default_instance.get("prometheus_metrics_prefix", ''))
scraper.label_to_hostname = instance.get("label_to_hostname", default_instance.get("prometheus_url", ""))
scraper.label_to_hostname = instance.get("label_to_hostname", default_instance.get("label_to_hostname", ""))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should fix the tests so we actually catch these issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good practice indeed! I added it, but leaving the other config params untested. Let me know if you think I should add them in the same act :-)

Copy link

@rishabh rishabh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :)

@antoinepouille antoinepouille merged commit b026234 into master Jul 3, 2018
@antoinepouille antoinepouille deleted the antoine/make-prometheus-check-timeout-configurable branch July 3, 2018 15:32
@antoinepouille antoinepouille restored the antoine/make-prometheus-check-timeout-configurable branch July 3, 2018 15:36
antoinepouille added a commit that referenced this pull request Jul 5, 2018
* Prometheus checks: Adding prometheus_timeout options

* Adding timeout to GenericPrometheusCheck integrations

* better code share for timeout definition

* Even better

* Adding tests

* Adding timeout option to some other PrometheusChecks

* Removed changes to gitlab integration

* removing gitlab_runner changes

* Removing validation for timeout values, leaving them to requests

* Adding some other tests for negative values + GenericPrometheusCheck

* Simplify logic

* Do not check for negative values, they would be set and handled by requests anyway

* Adding test for label_to_hostname default instance + changing default value
@antoinepouille antoinepouille deleted the antoine/make-prometheus-check-timeout-configurable branch July 5, 2018 15:58
@antoinepouille antoinepouille added this to the 6.4.0 milestone Jul 16, 2018
antoinepouille added a commit that referenced this pull request Jul 16, 2018
* Prometheus checks: Adding prometheus_timeout options

* Adding timeout to GenericPrometheusCheck integrations

* better code share for timeout definition

* Even better

* Adding tests

* Adding timeout option to some other PrometheusChecks

* Removed changes to gitlab integration

* removing gitlab_runner changes

* Removing validation for timeout values, leaving them to requests

* Adding some other tests for negative values + GenericPrometheusCheck

* Simplify logic

* Do not check for negative values, they would be set and handled by requests anyway

* Adding test for label_to_hostname default instance + changing default value
ian28223 added a commit that referenced this pull request Sep 10, 2018
hkaj pushed a commit that referenced this pull request Nov 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants