-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(gevent): run after-in-child hooks after reinit #4070
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we able to reproduce this behavior in a test?
I think yes, but not easily. I'll give it a try. |
Based on some of the CI failures, this is likely to require #4049. |
Threads created too early in an application that uses gevent end up not running after the gevent hub reinit is executed after fork in the child process. This change ensures that we trigger the after-in-child fork hooks after the call to gevent.hub.reinit to ensure that threads created at any time will run as expected after fork.
2f41f4c
to
98cb106
Compare
46e8995
to
8e347e1
Compare
It seems that |
3f2e0d6
to
87c1355
Compare
87c1355
to
e98a0ca
Compare
@brettlangdon I've added a test case. |
#4083 is a pre-requisite for this PR. |
e986c08
to
3e95dd7
Compare
@P403n1x87 do you have an idea of the side effect this has so far in the real world? |
…atchdog (#4083) ## Description This change improves support for Python legacy versions, like 2.7 and 3.5 by using a common loader wrapper that makes more attributes available upon request, e.g. is_package. Furthermore, this change also provides support for dict copy via the dict constructor, which is required by some pytest fixtures, like testdir. ### More technical details In Python<=3.5, calling the `dict` constructor on a dictionary makes a copy of it by first checking if it has a `keys` attribute, and then performing a `PyDict_Check`. If these checks succeed, the dictionary is copied using the C API. Since `ModuleWatchdog` is a subclass of `dict`, this means that the wrapping logic is bypassed, and `dict` ends up copying the dictionary backing `ModuleWatchdog` rather than the wrapped `sys.modules`. We exploit the `keys` attribute access to then copy the state of `sys.modules` over to the backing `dict`, so that calling `dict` on a `ModuleWatchdog` instance actually creates a copy of the wrapped dictionary instead of the backing one. ### Testing This change is a pre-requisite for #4070 to make the test suite pass. ## Checklist - [x] Title must conform to [conventional commit](https://github.com/conventional-changelog/commitlint/tree/master/%40commitlint/config-conventional). - [x] Add additional sections for `feat` and `fix` pull requests. - [x] Ensure tests are passing for affected code. - [x] [Library documentation](https://github.com/DataDog/dd-trace-py/tree/1.x/docs) and/or [Datadog's documentation site](https://github.com/DataDog/documentation/) is updated. Link to doc PR in description. ## Reviewer Checklist - [ ] Title is accurate. - [ ] Description motivates each change. - [ ] No unnecessary changes were introduced in this PR. - [ ] PR cannot be broken up into smaller PRs. - [ ] Avoid breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes unless absolutely necessary. - [ ] Tests provided or description of manual testing performed is included in the code or PR. - [x] Release note has been added for fixes and features, or else `changelog/no-changelog` label added. - [ ] All relevant GitHub issues are correctly linked. - [ ] Backports are identified and tagged with Mergifyio. - [ ] Add to milestone.
@P403n1x87 this pull request is now in conflict 😩 |
The side-effect should be that we now properly restart threads in e.g. gunicorn worker processes. I have done some manual testing with a Flask application and a modified tracer that starts the writer thread immediately. I was able to see the writer thread running correctly in each process with this fix, whereas without it the thread would be running only in the master process. |
@P403n1x87 this pull request is now in conflict 😩 |
Codecov Report
@@ Coverage Diff @@
## 1.x #4070 +/- ##
==========================================
- Coverage 78.86% 78.83% -0.03%
==========================================
Files 720 720
Lines 57386 57440 +54
==========================================
+ Hits 45255 45285 +30
- Misses 12131 12155 +24
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Re-initialize the gevent hub after-in-child fork hooks when DD_TRACE_GEVENT_HUB_PATCHED is set to true. This mitigates a performance regression introduced by: #4070.
…y configs (#4962) ## Description This pull request tries to answer two questions: * Is the [public documentation about gunicorn](https://ddtrace.readthedocs.io/en/stable/advanced_usage.html#gunicorn) accurate and complete? * What problematic ddtracepy behavior was [this commit](e1ab88c) fixing? It does so by implementing a series of integration tests against a real gunicorn server. Each test configures the server and ddtracepy in a different way. These tests check each configuration against known failure modes. The set of configurations that avoid all known failure modes is the one that the updated documentation recommends. There were a lot more tests exercising various buggy configurations, but I removed them to make CI run faster. Let me know if you think some of those other configurations need to be included in automated testing. By "known failure modes", I mean degraded performance, duplicated work, or crashes that I found through experimentation based on following logical threads from the relevant issues below. The big comment in the diff explains it in detail. ## Checklist - [x] [Library documentation](https://github.com/DataDog/dd-trace-py/tree/1.x/docs) - [ ] update these https://docs.datadoghq.com/ - [ ] update these https://app.datadoghq.com/apm/service-setup?architecture=host-based&language=python (I'll do the other documentation locations once we're in agreement about the documentation decisions made in this PR) ## Relevant issue(s) #2894 #4810 #4070 #1799 ## Testing strategy After the experimentation process that informed the integration test writing, the testing strategy is the tests themselves. ## Reviewer Checklist - [ ] Title is accurate. - [ ] Description motivates each change. - [ ] No unnecessary changes were introduced in this PR. - [ ] Avoid breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes unless absolutely necessary. - [ ] Tests provided or description of manual testing performed is included in the code or PR. - [ ] Release note has been added and follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/contributing.html#Release-Note-Guidelines), or else `changelog/no-changelog` label added. - [ ] All relevant GitHub issues are correctly linked. - [ ] Change contains telemetry where appropriate (logs, metrics, etc.). - [ ] Telemetry is meaningful, actionable and does not have the potential to leak sensitive data. --------- Co-authored-by: Munir Abdinur <[email protected]> Co-authored-by: Brett Langdon <[email protected]>
…y configs (#4962) ## Description This pull request tries to answer two questions: * Is the [public documentation about gunicorn](https://ddtrace.readthedocs.io/en/stable/advanced_usage.html#gunicorn) accurate and complete? * What problematic ddtracepy behavior was [this commit](e1ab88c) fixing? It does so by implementing a series of integration tests against a real gunicorn server. Each test configures the server and ddtracepy in a different way. These tests check each configuration against known failure modes. The set of configurations that avoid all known failure modes is the one that the updated documentation recommends. There were a lot more tests exercising various buggy configurations, but I removed them to make CI run faster. Let me know if you think some of those other configurations need to be included in automated testing. By "known failure modes", I mean degraded performance, duplicated work, or crashes that I found through experimentation based on following logical threads from the relevant issues below. The big comment in the diff explains it in detail. ## Checklist - [x] [Library documentation](https://github.com/DataDog/dd-trace-py/tree/1.x/docs) - [ ] update these https://docs.datadoghq.com/ - [ ] update these https://app.datadoghq.com/apm/service-setup?architecture=host-based&language=python (I'll do the other documentation locations once we're in agreement about the documentation decisions made in this PR) ## Relevant issue(s) #2894 #4810 #4070 #1799 ## Testing strategy After the experimentation process that informed the integration test writing, the testing strategy is the tests themselves. ## Reviewer Checklist - [ ] Title is accurate. - [ ] Description motivates each change. - [ ] No unnecessary changes were introduced in this PR. - [ ] Avoid breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes unless absolutely necessary. - [ ] Tests provided or description of manual testing performed is included in the code or PR. - [ ] Release note has been added and follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/contributing.html#Release-Note-Guidelines), or else `changelog/no-changelog` label added. - [ ] All relevant GitHub issues are correctly linked. - [ ] Change contains telemetry where appropriate (logs, metrics, etc.). - [ ] Telemetry is meaningful, actionable and does not have the potential to leak sensitive data. --------- Co-authored-by: Munir Abdinur <[email protected]> Co-authored-by: Brett Langdon <[email protected]>
Description
Threads created too early in an application that uses gevent end up not running when the gevent hub reinit is executed after fork in the child process. This change ensures that we trigger the after-in-child fork hooks after the call to
gevent.hub.reinit
to ensure that threads created at any time will run as expected after fork.More details
In its implementation,
gevent
wraps aroundos.fork
and callsgevent.hub.reinit
to re-initialise the state of the child process after fork. Theforksafe
hooks that are registered by the library end up running after the call to the OSfork
syscall, but beforegevent.hub.reinit
, which causes the threads to not work as expected in the child process. Other frameworks, likegunicorn
make their own call toos.fork
and then callgevent.hub.reinit
in the child process to obtain the same effect. This poses the same problem as in plaingevent
. With this change, we register an import hook ongevent.hub
to detect the possibility thatgevent.hub.reinit
might be called, and we patch this function to re-run the after-in-child fork hook to ensure that they run after the gevent hub has been reinitialised fully in the child process after fork. In particular, this ensures that threads are recreated at the right time and can then work as expected in child processes.Checklist
feat
andfix
pull requests.Reviewer Checklist
changelog/no-changelog
label added.