Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix warnings usage related to RequestsWrapper, Openmetrics and Prometheus #5080

Merged
merged 11 commits into from
Nov 27, 2019

Conversation

AlexandreYang
Copy link
Member

@AlexandreYang AlexandreYang commented Nov 25, 2019

What does this PR do?

  1. Remove warning_lock for RequestsWrapper. The warning lock is a bottleneck for requests calls made via RequestsWrapper.

  2. Replace disable_warnings with disable_warnings_ctx for Openmetrics and Prometheus mixins

disable_warnings effect was never restored/reverted.

  1. Use custom disable_warnings_ctx instead of warnings.catch_warnings.

warnings.catch_warnings is not thread-safe and might leave the warnings global variables in a unpredictable state.

  1. Fix warnings.filters leak issue by using a custom version called _simplefilter_py2

Each call to disable_warnings(InsecureRequestWarning) or warnings.simplefilter('ignore', InsecureRequestWarning) is adding an item to warnings.filters for Python 2, which cause a memory leak.

Python 3 does not have that issue, because it removes items before appending: https://github.com/python/cpython/blob/1f864016950079a618a916140f3abe6480a8e22d/Lib/warnings.py#L186

Python 2 does not: https://github.com/python/cpython/blob/e6499033032d5b647e43a3b49da0c1c64b151743/Lib/warnings.py#L112

Motivation

The warning lock is a bottleneck for requests calls made via RequestsWrapper. It makes some simple requests that should take ~300ms to 20-40secs.

CAVEAT: once disable_warnings(InsecureRequestWarning) (aka warnings.simplefilter('ignore', InsecureRequestWarning)) is called, it has a global effect for all checks.

Additional Notes

First 2/3 of the chart: before the change
Last 1/3 of the chart: after the change
image

@codecov
Copy link

codecov bot commented Nov 25, 2019

Codecov Report

Merging #5080 into master will increase coverage by 0.46%.
The diff coverage is 100%.

Impacted Files Coverage Δ
...se/datadog_checks/base/checks/prometheus/mixins.py 80% <100%> (+0.96%) ⬆️
datadog_checks_base/tests/test_warnings_util.py 100% <100%> (ø)
datadog_checks_base/tests/test_prometheus.py 100% <100%> (ø) ⬆️
...adog_checks_base/datadog_checks/base/utils/http.py 100% <100%> (ø) ⬆️
...e/datadog_checks/base/checks/openmetrics/mixins.py 79.06% <100%> (+0.56%) ⬆️
datadog_checks_base/tests/test_openmetrics.py 99.02% <100%> (+0.01%) ⬆️
...ks_base/datadog_checks/base/utils/warnings_util.py 100% <100%> (ø)
tls/datadog_checks/tls/__init__.py 100% <0%> (ø) ⬆️
...atadog_checks/active_directory/active_directory.py 100% <0%> (ø) ⬆️
aspdotnet/datadog_checks/aspdotnet/__init__.py 100% <0%> (ø) ⬆️
... and 837 more

@AlexandreYang AlexandreYang force-pushed the alex/http_check_fix_lock branch 2 times, most recently from da009bf to e88f027 Compare November 25, 2019 23:02
Copy link
Contributor

@ofek ofek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot do something that would affect every check, nor can we remove the option. Both are breaking changes. Let's discuss this 🙂

@therve
Copy link
Contributor

therve commented Nov 26, 2019

A few thoughts:

  • We need to understand why locks are breaking like that. It's not the only place where we use locks, so we ought to get a better clue.
  • That said, I don't think losing a warning is a big problem, whereas performance issues are one.

My suggestion is just to drop the lock. We may lose some warnings (I don't think we can get more?) but that sounds very low risk. It would only happen when people put tls_verify to false somewhere anyway.

Oh and start by checking the conditional to not enter catch_warnings uselessly.

@AlexandreYang
Copy link
Member Author

@therve Thanks for the feedbacks.

We need to understand why locks are breaking like that. It's not the only place where we use locks, so we ought to get a better clue.

My hypothesis is that, since the lock is surrounding the http calls, the lock is just waiting for the http calls to finish. Hence, making then in some way sequential.

My suggestion is just to drop the lock. We may lose some warnings.

You mean, dropping the lock, but keeping the with warnings.catch_warnings ?

Since warnings.catch_warnings is not thread-safe, it can lead to unpredictable behaviour. So, I'm not sure if it worth keep it.

Oh and start by checking the conditional to not enter catch_warnings uselessly.

I'm not sure what you mean :)

@therve
Copy link
Contributor

therve commented Nov 26, 2019

@therve Thanks for the feedbacks.

We need to understand why locks are breaking like that. It's not the only place where we use locks, so we ought to get a better clue.

My hypothesis is that, since the lock is surrounding the http calls, the lock is just waiting for the http calls to finish. Hence, making then in some way sequential.

My suggestion is just to drop the lock. We may lose some warnings.

You mean, dropping the lock, but keeping the with warnings.catch_warnings ?

Since warnings.catch_warnings is not thread-safe, it can lead to unpredictable behaviour. So, I'm not sure if it worth keep it.

I mean by unpredictable it's related to warnings, no? What else could happen?

Oh and start by checking the conditional to not enter catch_warnings uselessly.

I'm not sure what you mean :)

Move up the if self.ignore_tls_warning. RIght now we lock all the time AFAIU.

ofek
ofek previously approved these changes Nov 26, 2019
@AlexandreYang AlexandreYang changed the title Remove RequestsWrapper warning lock Fix warnings usage related to RequestsWrapper, Openmetrics and Prometheus Nov 26, 2019
Copy link
Contributor

@therve therve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, but a few issues in tests.

datadog_checks_base/tests/test_openmetrics.py Outdated Show resolved Hide resolved
datadog_checks_base/tests/test_prometheus.py Outdated Show resolved Hide resolved
datadog_checks_base/tests/test_warnings_util.py Outdated Show resolved Hide resolved
datadog_checks_base/tests/test_warnings_util.py Outdated Show resolved Hide resolved
therve
therve previously approved these changes Nov 27, 2019
therve
therve previously approved these changes Nov 27, 2019
Copy link
Contributor

@ofek ofek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job!

@AlexandreYang AlexandreYang merged commit c2387ff into master Nov 27, 2019
@AlexandreYang AlexandreYang deleted the alex/http_check_fix_lock branch November 27, 2019 14:56
@coignetp coignetp mentioned this pull request Jun 24, 2020
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants