Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(gevent): run after-in-child hooks after reinit #4070

Merged
merged 15 commits into from
Aug 19, 2022

Conversation

P403n1x87
Copy link
Contributor

@P403n1x87 P403n1x87 commented Aug 10, 2022

Description

Threads created too early in an application that uses gevent end up not running when the gevent hub reinit is executed after fork in the child process. This change ensures that we trigger the after-in-child fork hooks after the call to gevent.hub.reinit to ensure that threads created at any time will run as expected after fork.

More details

In its implementation, gevent wraps around os.fork and calls gevent.hub.reinit to re-initialise the state of the child process after fork. The forksafe hooks that are registered by the library end up running after the call to the OS fork syscall, but before gevent.hub.reinit, which causes the threads to not work as expected in the child process. Other frameworks, like gunicorn make their own call to os.fork and then call gevent.hub.reinit in the child process to obtain the same effect. This poses the same problem as in plain gevent. With this change, we register an import hook on gevent.hub to detect the possibility that gevent.hub.reinit might be called, and we patch this function to re-run the after-in-child fork hook to ensure that they run after the gevent hub has been reinitialised fully in the child process after fork. In particular, this ensures that threads are recreated at the right time and can then work as expected in child processes.

Checklist

Reviewer Checklist

  • Title is accurate.
  • Description motivates each change.
  • No unnecessary changes were introduced in this PR.
  • PR cannot be broken up into smaller PRs.
  • Avoid breaking API changes unless absolutely necessary.
  • Tests provided or description of manual testing performed is included in the code or PR.
  • Release note has been added for fixes and features, or else changelog/no-changelog label added.
  • All relevant GitHub issues are correctly linked.
  • Backports are identified and tagged with Mergifyio.
  • Add to milestone.

@P403n1x87 P403n1x87 added the changelog/no-changelog A changelog entry is not required for this PR. label Aug 10, 2022
@P403n1x87 P403n1x87 requested a review from a team as a code owner August 10, 2022 10:42
Copy link
Member

@brettlangdon brettlangdon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we able to reproduce this behavior in a test?

@P403n1x87
Copy link
Contributor Author

are we able to reproduce this behavior in a test?

I think yes, but not easily. I'll give it a try.

@P403n1x87
Copy link
Contributor Author

Based on some of the CI failures, this is likely to require #4049.

Threads created too early in an application that uses gevent end up not
running after the gevent hub reinit is executed after fork in the child
process. This change ensures that we trigger the after-in-child fork
hooks after the call to gevent.hub.reinit to ensure that threads created
at any time will run as expected after fork.
@P403n1x87 P403n1x87 force-pushed the chore/gevent-on-fork branch from 2f41f4c to 98cb106 Compare August 10, 2022 15:12
@P403n1x87 P403n1x87 force-pushed the chore/gevent-on-fork branch 2 times, most recently from 46e8995 to 8e347e1 Compare August 11, 2022 10:57
ddtrace/internal/module.py Outdated Show resolved Hide resolved
@P403n1x87
Copy link
Contributor Author

It seems that ModuleWatchdog conflicts with the testdir fixture on Python 2.

@P403n1x87 P403n1x87 force-pushed the chore/gevent-on-fork branch 3 times, most recently from 3f2e0d6 to 87c1355 Compare August 11, 2022 15:34
@P403n1x87 P403n1x87 force-pushed the chore/gevent-on-fork branch from 87c1355 to e98a0ca Compare August 11, 2022 16:10
@P403n1x87
Copy link
Contributor Author

@brettlangdon I've added a test case.

@P403n1x87 P403n1x87 requested review from brettlangdon and a team August 12, 2022 13:26
ddtrace/internal/module.py Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
@P403n1x87
Copy link
Contributor Author

#4083 is a pre-requisite for this PR.

@P403n1x87 P403n1x87 force-pushed the chore/gevent-on-fork branch from e986c08 to 3e95dd7 Compare August 13, 2022 10:13
ddtrace/__init__.py Outdated Show resolved Hide resolved
tests/internal/test_module.py Outdated Show resolved Hide resolved
tests/internal/test_module.py Outdated Show resolved Hide resolved
tests/profiling/collector/test_threading.py Outdated Show resolved Hide resolved
tests/profiling/collector/test_traceback.py Outdated Show resolved Hide resolved
tests/tracer/test_forksafe.py Show resolved Hide resolved
@jd
Copy link
Contributor

jd commented Aug 16, 2022

@P403n1x87 do you have an idea of the side effect this has so far in the real world?
I'm curious as we're detecting this issue a bit "late".

mergify bot pushed a commit that referenced this pull request Aug 16, 2022
…atchdog (#4083)

## Description

This change improves support for Python legacy versions, like 2.7 and 3.5 by using a common loader wrapper that makes more attributes available upon request, e.g. is_package. Furthermore, this change also provides support for dict copy via the dict constructor, which is required by some pytest fixtures, like testdir.

### More technical details

In Python<=3.5, calling the `dict` constructor on a dictionary makes a copy of it by first checking if it has a `keys` attribute, and then performing a `PyDict_Check`. If these checks succeed, the dictionary is copied using the C API. Since `ModuleWatchdog` is a subclass of `dict`, this means that the wrapping logic is bypassed, and `dict` ends up copying the dictionary backing `ModuleWatchdog` rather than the wrapped `sys.modules`. We exploit the `keys` attribute access to then copy the state of `sys.modules` over to the backing `dict`, so that calling `dict` on a `ModuleWatchdog` instance actually creates a copy of the wrapped dictionary instead of the backing one.

### Testing

This change is a pre-requisite for #4070 to make the test suite pass.

## Checklist
- [x] Title must conform to [conventional commit](https://github.com/conventional-changelog/commitlint/tree/master/%40commitlint/config-conventional).
- [x] Add additional sections for `feat` and `fix` pull requests.
- [x] Ensure tests are passing for affected code.
- [x] [Library documentation](https://github.com/DataDog/dd-trace-py/tree/1.x/docs) and/or [Datadog's documentation site](https://github.com/DataDog/documentation/) is updated. Link to doc PR in description.


## Reviewer Checklist
- [ ] Title is accurate.
- [ ] Description motivates each change.
- [ ] No unnecessary changes were introduced in this PR.
- [ ] PR cannot be broken up into smaller PRs.
- [ ] Avoid breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes unless absolutely necessary.
- [ ] Tests provided or description of manual testing performed is included in the code or PR.
- [x] Release note has been added for fixes and features, or else `changelog/no-changelog` label added.
- [ ] All relevant GitHub issues are correctly linked.
- [ ] Backports are identified and tagged with Mergifyio.
- [ ] Add to milestone.
@mergify
Copy link
Contributor

mergify bot commented Aug 16, 2022

@P403n1x87 this pull request is now in conflict 😩

@mergify mergify bot added the conflict label Aug 16, 2022
@P403n1x87
Copy link
Contributor Author

@P403n1x87 do you have an idea of the side effect this has so far in the real world? I'm curious as we're detecting this issue a bit "late".

The side-effect should be that we now properly restart threads in e.g. gunicorn worker processes. I have done some manual testing with a Flask application and a modified tracer that starts the writer thread immediately. I was able to see the writer thread running correctly in each process with this fix, whereas without it the thread would be running only in the master process.

@mergify mergify bot removed the conflict label Aug 17, 2022
@mergify
Copy link
Contributor

mergify bot commented Aug 18, 2022

@P403n1x87 this pull request is now in conflict 😩

@mergify mergify bot added the conflict label Aug 18, 2022
@mergify mergify bot removed the conflict label Aug 18, 2022
@codecov-commenter
Copy link

Codecov Report

Merging #4070 (8721108) into 1.x (a940ef1) will decrease coverage by 0.02%.
The diff coverage is 27.27%.

@@            Coverage Diff             @@
##              1.x    #4070      +/-   ##
==========================================
- Coverage   78.86%   78.83%   -0.03%     
==========================================
  Files         720      720              
  Lines       57386    57440      +54     
==========================================
+ Hits        45255    45285      +30     
- Misses      12131    12155      +24     
Impacted Files Coverage Δ
tests/internal/test_module.py 0.00% <0.00%> (ø)
tests/profiling/collector/test_threading.py 0.00% <0.00%> (ø)
tests/profiling/collector/test_traceback.py 0.00% <0.00%> (ø)
tests/tracer/test_forksafe.py 69.76% <9.09%> (-11.01%) ⬇️
ddtrace/internal/forksafe.py 83.58% <66.66%> (-2.63%) ⬇️
ddtrace/__init__.py 100.00% <100.00%> (ø)
ddtrace/internal/nogevent.py 65.21% <100.00%> (+0.77%) ⬆️
ddtrace/sampler.py 96.15% <100.00%> (ø)
tests/contrib/httplib/test_httplib.py 97.86% <100.00%> (+<0.01%) ⬆️
ddtrace/internal/module.py 86.71% <0.00%> (+5.94%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@mergify mergify bot merged commit 9b05c3b into DataDog:1.x Aug 19, 2022
@P403n1x87 P403n1x87 deleted the chore/gevent-on-fork branch August 19, 2022 14:11
mabdinur added a commit that referenced this pull request Oct 3, 2022
Kyle-Verhoog pushed a commit that referenced this pull request Oct 3, 2022
Re-initialize the gevent hub after-in-child fork hooks when DD_TRACE_GEVENT_HUB_PATCHED is set to true. This mitigates a performance regression introduced by: #4070.
mergify bot pushed a commit that referenced this pull request Oct 3, 2022
Re-initialize the gevent hub after-in-child fork hooks when DD_TRACE_GEVENT_HUB_PATCHED is set to true. This mitigates a performance regression introduced by: #4070.

(cherry picked from commit 57ea1f8)
emmettbutler added a commit that referenced this pull request Jan 31, 2023
…y configs (#4962)

## Description

This pull request tries to answer two questions: 

* Is the [public documentation about
gunicorn](https://ddtrace.readthedocs.io/en/stable/advanced_usage.html#gunicorn)
accurate and complete?
* What problematic ddtracepy behavior was [this
commit](e1ab88c)
fixing?

It does so by implementing a series of integration tests against a real
gunicorn server. Each test configures the server and ddtracepy in a
different way. These tests check each configuration against known
failure modes. The set of configurations that avoid all known failure
modes is the one that the updated documentation recommends.

There were a lot more tests exercising various buggy configurations, but
I removed them to make CI run faster. Let me know if you think some of
those other configurations need to be included in automated testing.

By "known failure modes", I mean degraded performance, duplicated work,
or crashes that I found through experimentation based on following
logical threads from the relevant issues below. The big comment in the
diff explains it in detail.

## Checklist
- [x] [Library
documentation](https://github.com/DataDog/dd-trace-py/tree/1.x/docs)
- [ ] update these https://docs.datadoghq.com/
- [ ] update these
https://app.datadoghq.com/apm/service-setup?architecture=host-based&language=python

(I'll do the other documentation locations once we're in agreement about
the documentation decisions made in this PR)

## Relevant issue(s)

#2894
#4810
#4070
#1799

## Testing strategy

After the experimentation process that informed the integration test
writing, the testing strategy is the tests themselves.

## Reviewer Checklist
- [ ] Title is accurate.
- [ ] Description motivates each change.
- [ ] No unnecessary changes were introduced in this PR.
- [ ] Avoid breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes unless absolutely necessary.
- [ ] Tests provided or description of manual testing performed is
included in the code or PR.
- [ ] Release note has been added and follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/contributing.html#Release-Note-Guidelines),
or else `changelog/no-changelog` label added.
- [ ] All relevant GitHub issues are correctly linked.
- [ ] Change contains telemetry where appropriate (logs, metrics, etc.).
- [ ] Telemetry is meaningful, actionable and does not have the
potential to leak sensitive data.

---------

Co-authored-by: Munir Abdinur <[email protected]>
Co-authored-by: Brett Langdon <[email protected]>
jbertran pushed a commit that referenced this pull request Feb 1, 2023
…y configs (#4962)

## Description

This pull request tries to answer two questions: 

* Is the [public documentation about
gunicorn](https://ddtrace.readthedocs.io/en/stable/advanced_usage.html#gunicorn)
accurate and complete?
* What problematic ddtracepy behavior was [this
commit](e1ab88c)
fixing?

It does so by implementing a series of integration tests against a real
gunicorn server. Each test configures the server and ddtracepy in a
different way. These tests check each configuration against known
failure modes. The set of configurations that avoid all known failure
modes is the one that the updated documentation recommends.

There were a lot more tests exercising various buggy configurations, but
I removed them to make CI run faster. Let me know if you think some of
those other configurations need to be included in automated testing.

By "known failure modes", I mean degraded performance, duplicated work,
or crashes that I found through experimentation based on following
logical threads from the relevant issues below. The big comment in the
diff explains it in detail.

## Checklist
- [x] [Library
documentation](https://github.com/DataDog/dd-trace-py/tree/1.x/docs)
- [ ] update these https://docs.datadoghq.com/
- [ ] update these
https://app.datadoghq.com/apm/service-setup?architecture=host-based&language=python

(I'll do the other documentation locations once we're in agreement about
the documentation decisions made in this PR)

## Relevant issue(s)

#2894
#4810
#4070
#1799

## Testing strategy

After the experimentation process that informed the integration test
writing, the testing strategy is the tests themselves.

## Reviewer Checklist
- [ ] Title is accurate.
- [ ] Description motivates each change.
- [ ] No unnecessary changes were introduced in this PR.
- [ ] Avoid breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes unless absolutely necessary.
- [ ] Tests provided or description of manual testing performed is
included in the code or PR.
- [ ] Release note has been added and follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/contributing.html#Release-Note-Guidelines),
or else `changelog/no-changelog` label added.
- [ ] All relevant GitHub issues are correctly linked.
- [ ] Change contains telemetry where appropriate (logs, metrics, etc.).
- [ ] Telemetry is meaningful, actionable and does not have the
potential to leak sensitive data.

---------

Co-authored-by: Munir Abdinur <[email protected]>
Co-authored-by: Brett Langdon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog/no-changelog A changelog entry is not required for this PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants