chore(gevent): run after-in-child hooks after reinit #4070

P403n1x87 · 2022-08-10T10:42:49Z

Description

Threads created too early in an application that uses gevent end up not running when the gevent hub reinit is executed after fork in the child process. This change ensures that we trigger the after-in-child fork hooks after the call to gevent.hub.reinit to ensure that threads created at any time will run as expected after fork.

More details

In its implementation, gevent wraps around os.fork and calls gevent.hub.reinit to re-initialise the state of the child process after fork. The forksafe hooks that are registered by the library end up running after the call to the OS fork syscall, but before gevent.hub.reinit, which causes the threads to not work as expected in the child process. Other frameworks, like gunicorn make their own call to os.fork and then call gevent.hub.reinit in the child process to obtain the same effect. This poses the same problem as in plain gevent. With this change, we register an import hook on gevent.hub to detect the possibility that gevent.hub.reinit might be called, and we patch this function to re-run the after-in-child fork hook to ensure that they run after the gevent hub has been reinitialised fully in the child process after fork. In particular, this ensures that threads are recreated at the right time and can then work as expected in child processes.

Checklist

Title must conform to conventional commit.
Add additional sections for feat and fix pull requests.
Ensure tests are passing for affected code.
Library documentation and/or Datadog's documentation site is updated. Link to doc PR in description.

Reviewer Checklist

brettlangdon

are we able to reproduce this behavior in a test?

P403n1x87 · 2022-08-10T12:30:32Z

are we able to reproduce this behavior in a test?

I think yes, but not easily. I'll give it a try.

P403n1x87 · 2022-08-10T13:55:57Z

Based on some of the CI failures, this is likely to require #4049.

Threads created too early in an application that uses gevent end up not running after the gevent hub reinit is executed after fork in the child process. This change ensures that we trigger the after-in-child fork hooks after the call to gevent.hub.reinit to ensure that threads created at any time will run as expected after fork.

…d-trace-py into chore/gevent-on-fork

ddtrace/internal/module.py

P403n1x87 · 2022-08-11T11:20:29Z

It seems that ModuleWatchdog conflicts with the testdir fixture on Python 2.

tests/tracer/test_forksafe.py

P403n1x87 · 2022-08-12T13:26:29Z

@brettlangdon I've added a test case.

ddtrace/internal/module.py

pyproject.toml

P403n1x87 · 2022-08-13T09:59:16Z

#4083 is a pre-requisite for this PR.

ddtrace/__init__.py

tests/internal/test_module.py

tests/profiling/collector/test_threading.py

tests/profiling/collector/test_traceback.py

tests/tracer/test_forksafe.py

jd · 2022-08-16T15:47:03Z

@P403n1x87 do you have an idea of the side effect this has so far in the real world?
I'm curious as we're detecting this issue a bit "late".

…atchdog (#4083) ## Description This change improves support for Python legacy versions, like 2.7 and 3.5 by using a common loader wrapper that makes more attributes available upon request, e.g. is_package. Furthermore, this change also provides support for dict copy via the dict constructor, which is required by some pytest fixtures, like testdir. ### More technical details In Python<=3.5, calling the `dict` constructor on a dictionary makes a copy of it by first checking if it has a `keys` attribute, and then performing a `PyDict_Check`. If these checks succeed, the dictionary is copied using the C API. Since `ModuleWatchdog` is a subclass of `dict`, this means that the wrapping logic is bypassed, and `dict` ends up copying the dictionary backing `ModuleWatchdog` rather than the wrapped `sys.modules`. We exploit the `keys` attribute access to then copy the state of `sys.modules` over to the backing `dict`, so that calling `dict` on a `ModuleWatchdog` instance actually creates a copy of the wrapped dictionary instead of the backing one. ### Testing This change is a pre-requisite for #4070 to make the test suite pass. ## Checklist - [x] Title must conform to [conventional commit](https://github.com/conventional-changelog/commitlint/tree/master/%40commitlint/config-conventional). - [x] Add additional sections for `feat` and `fix` pull requests. - [x] Ensure tests are passing for affected code. - [x] [Library documentation](https://github.com/DataDog/dd-trace-py/tree/1.x/docs) and/or [Datadog's documentation site](https://github.com/DataDog/documentation/) is updated. Link to doc PR in description. ## Reviewer Checklist - [ ] Title is accurate. - [ ] Description motivates each change. - [ ] No unnecessary changes were introduced in this PR. - [ ] PR cannot be broken up into smaller PRs. - [ ] Avoid breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes unless absolutely necessary. - [ ] Tests provided or description of manual testing performed is included in the code or PR. - [x] Release note has been added for fixes and features, or else `changelog/no-changelog` label added. - [ ] All relevant GitHub issues are correctly linked. - [ ] Backports are identified and tagged with Mergifyio. - [ ] Add to milestone.

mergify · 2022-08-16T16:53:10Z

@P403n1x87 this pull request is now in conflict 😩

P403n1x87 · 2022-08-17T09:25:29Z

@P403n1x87 do you have an idea of the side effect this has so far in the real world? I'm curious as we're detecting this issue a bit "late".

The side-effect should be that we now properly restart threads in e.g. gunicorn worker processes. I have done some manual testing with a Flask application and a modified tracer that starts the writer thread immediately. I was able to see the writer thread running correctly in each process with this fix, whereas without it the thread would be running only in the master process.

mergify · 2022-08-18T14:58:48Z

@P403n1x87 this pull request is now in conflict 😩

codecov-commenter · 2022-08-18T16:09:39Z

Codecov Report

Merging #4070 (8721108) into 1.x (a940ef1) will decrease coverage by 0.02%.
The diff coverage is 27.27%.

@@            Coverage Diff             @@
##              1.x    #4070      +/-   ##
==========================================
- Coverage   78.86%   78.83%   -0.03%     
==========================================
  Files         720      720              
  Lines       57386    57440      +54     
==========================================
+ Hits        45255    45285      +30     
- Misses      12131    12155      +24

Impacted Files	Coverage Δ
tests/internal/test_module.py	`0.00% <0.00%> (ø)`
tests/profiling/collector/test_threading.py	`0.00% <0.00%> (ø)`
tests/profiling/collector/test_traceback.py	`0.00% <0.00%> (ø)`
tests/tracer/test_forksafe.py	`69.76% <9.09%> (-11.01%)`	⬇️
ddtrace/internal/forksafe.py	`83.58% <66.66%> (-2.63%)`	⬇️
ddtrace/__init__.py	`100.00% <100.00%> (ø)`
ddtrace/internal/nogevent.py	`65.21% <100.00%> (+0.77%)`	⬆️
ddtrace/sampler.py	`96.15% <100.00%> (ø)`
tests/contrib/httplib/test_httplib.py	`97.86% <100.00%> (+<0.01%)`	⬆️
ddtrace/internal/module.py	`86.71% <0.00%> (+5.94%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

This reverts commit 9b05c3b.

Re-initialize the gevent hub after-in-child fork hooks when DD_TRACE_GEVENT_HUB_PATCHED is set to true. This mitigates a performance regression introduced by: #4070.

Re-initialize the gevent hub after-in-child fork hooks when DD_TRACE_GEVENT_HUB_PATCHED is set to true. This mitigates a performance regression introduced by: #4070. (cherry picked from commit 57ea1f8)

…y configs (#4962) ## Description This pull request tries to answer two questions: * Is the [public documentation about gunicorn](https://ddtrace.readthedocs.io/en/stable/advanced_usage.html#gunicorn) accurate and complete? * What problematic ddtracepy behavior was [this commit](e1ab88c) fixing? It does so by implementing a series of integration tests against a real gunicorn server. Each test configures the server and ddtracepy in a different way. These tests check each configuration against known failure modes. The set of configurations that avoid all known failure modes is the one that the updated documentation recommends. There were a lot more tests exercising various buggy configurations, but I removed them to make CI run faster. Let me know if you think some of those other configurations need to be included in automated testing. By "known failure modes", I mean degraded performance, duplicated work, or crashes that I found through experimentation based on following logical threads from the relevant issues below. The big comment in the diff explains it in detail. ## Checklist - [x] [Library documentation](https://github.com/DataDog/dd-trace-py/tree/1.x/docs) - [ ] update these https://docs.datadoghq.com/ - [ ] update these https://app.datadoghq.com/apm/service-setup?architecture=host-based&language=python (I'll do the other documentation locations once we're in agreement about the documentation decisions made in this PR) ## Relevant issue(s) #2894 #4810 #4070 #1799 ## Testing strategy After the experimentation process that informed the integration test writing, the testing strategy is the tests themselves. ## Reviewer Checklist - [ ] Title is accurate. - [ ] Description motivates each change. - [ ] No unnecessary changes were introduced in this PR. - [ ] Avoid breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes unless absolutely necessary. - [ ] Tests provided or description of manual testing performed is included in the code or PR. - [ ] Release note has been added and follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/contributing.html#Release-Note-Guidelines), or else `changelog/no-changelog` label added. - [ ] All relevant GitHub issues are correctly linked. - [ ] Change contains telemetry where appropriate (logs, metrics, etc.). - [ ] Telemetry is meaningful, actionable and does not have the potential to leak sensitive data. --------- Co-authored-by: Munir Abdinur <[email protected]> Co-authored-by: Brett Langdon <[email protected]>

P403n1x87 added the changelog/no-changelog A changelog entry is not required for this PR. label Aug 10, 2022

P403n1x87 requested a review from a team as a code owner August 10, 2022 10:42

brettlangdon reviewed Aug 10, 2022

View reviewed changes

P403n1x87 force-pushed the chore/gevent-on-fork branch from 2f41f4c to 98cb106 Compare August 10, 2022 15:12

P403n1x87 and others added 4 commits August 10, 2022 19:52

Merge branch '1.x' into chore/gevent-on-fork

68eee58

Merge branch '1.x' into chore/gevent-on-fork

1b882a5

improve module watchdog on Python 2

3c5cd14

Merge branch 'chore/gevent-on-fork' of https://github.com/p403n1x87/d…

6581ada

…d-trace-py into chore/gevent-on-fork

P403n1x87 force-pushed the chore/gevent-on-fork branch 2 times, most recently from 46e8995 to 8e347e1 Compare August 11, 2022 10:57

P403n1x87 commented Aug 11, 2022

View reviewed changes

ddtrace/internal/module.py Outdated Show resolved Hide resolved

P403n1x87 force-pushed the chore/gevent-on-fork branch 3 times, most recently from 3f2e0d6 to 87c1355 Compare August 11, 2022 15:34

add test case

e98a0ca

P403n1x87 force-pushed the chore/gevent-on-fork branch from 87c1355 to e98a0ca Compare August 11, 2022 16:10

P403n1x87 and others added 3 commits August 11, 2022 17:22

Merge branch '1.x' into chore/gevent-on-fork

bec7ec2

setuptools pin

3c51d30

dict copy fix for Python 3.5

90a321c

P403n1x87 commented Aug 12, 2022

View reviewed changes

tests/tracer/test_forksafe.py Outdated Show resolved Hide resolved

P403n1x87 requested review from brettlangdon and a team August 12, 2022 13:26

P403n1x87 commented Aug 12, 2022

View reviewed changes

ddtrace/internal/module.py Outdated Show resolved Hide resolved

P403n1x87 commented Aug 12, 2022

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

use sys.stdout.write intead of print

3579c83

fix tests

3e95dd7

P403n1x87 force-pushed the chore/gevent-on-fork branch from e986c08 to 3e95dd7 Compare August 13, 2022 10:13

Merge branch '1.x' into chore/gevent-on-fork

3fa57e9

brettlangdon reviewed Aug 16, 2022

View reviewed changes

mergify bot added the conflict label Aug 16, 2022

Merge remote-tracking branch 'upstream/1.x' into chore/gevent-on-fork

fba851c

mergify bot removed the conflict label Aug 17, 2022

mergify bot added the conflict label Aug 18, 2022

Merge branch '1.x' into chore/gevent-on-fork

8721108

mergify bot removed the conflict label Aug 18, 2022

jd approved these changes Aug 19, 2022

View reviewed changes

Merge branch '1.x' into chore/gevent-on-fork

bea9828

brettlangdon approved these changes Aug 19, 2022

View reviewed changes

mergify bot merged commit 9b05c3b into DataDog:1.x Aug 19, 2022

P403n1x87 deleted the chore/gevent-on-fork branch August 19, 2022 14:11

mabdinur added a commit that referenced this pull request Oct 3, 2022

Revert "chore(gevent): run after-in-child hooks after reinit (#4070)"

a003e4e

This reverts commit 9b05c3b.

mabdinur mentioned this pull request Oct 3, 2022

chore(gevent): disable gevent hub reinit #4252

Merged

14 tasks

emmettbutler mentioned this pull request Jan 25, 2023

chore(gunicorn): test PeriodicService under various gunicorn+ddtracepy configs #4962

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(gevent): run after-in-child hooks after reinit #4070

chore(gevent): run after-in-child hooks after reinit #4070

P403n1x87 commented Aug 10, 2022 •

edited

Loading

brettlangdon left a comment

P403n1x87 commented Aug 10, 2022

P403n1x87 commented Aug 10, 2022

P403n1x87 commented Aug 11, 2022

P403n1x87 commented Aug 12, 2022

P403n1x87 commented Aug 13, 2022

jd commented Aug 16, 2022

mergify bot commented Aug 16, 2022

P403n1x87 commented Aug 17, 2022

mergify bot commented Aug 18, 2022

codecov-commenter commented Aug 18, 2022

chore(gevent): run after-in-child hooks after reinit #4070

chore(gevent): run after-in-child hooks after reinit #4070

Conversation

P403n1x87 commented Aug 10, 2022 • edited Loading

Description

More details

Checklist

Reviewer Checklist

brettlangdon left a comment

Choose a reason for hiding this comment

P403n1x87 commented Aug 10, 2022

P403n1x87 commented Aug 10, 2022

P403n1x87 commented Aug 11, 2022

P403n1x87 commented Aug 12, 2022

P403n1x87 commented Aug 13, 2022

jd commented Aug 16, 2022

mergify bot commented Aug 16, 2022

P403n1x87 commented Aug 17, 2022

mergify bot commented Aug 18, 2022

codecov-commenter commented Aug 18, 2022

Codecov Report

P403n1x87 commented Aug 10, 2022 •

edited

Loading