Skip to content

Commit

Permalink
chore(gunicorn): test PeriodicService under various gunicorn+ddtracep…
Browse files Browse the repository at this point in the history
…y configs (#4962)

## Description

This pull request tries to answer two questions: 

* Is the [public documentation about
gunicorn](https://ddtrace.readthedocs.io/en/stable/advanced_usage.html#gunicorn)
accurate and complete?
* What problematic ddtracepy behavior was [this
commit](e1ab88c)
fixing?

It does so by implementing a series of integration tests against a real
gunicorn server. Each test configures the server and ddtracepy in a
different way. These tests check each configuration against known
failure modes. The set of configurations that avoid all known failure
modes is the one that the updated documentation recommends.

There were a lot more tests exercising various buggy configurations, but
I removed them to make CI run faster. Let me know if you think some of
those other configurations need to be included in automated testing.

By "known failure modes", I mean degraded performance, duplicated work,
or crashes that I found through experimentation based on following
logical threads from the relevant issues below. The big comment in the
diff explains it in detail.

## Checklist
- [x] [Library
documentation](https://github.com/DataDog/dd-trace-py/tree/1.x/docs)
- [ ] update these https://docs.datadoghq.com/
- [ ] update these
https://app.datadoghq.com/apm/service-setup?architecture=host-based&language=python

(I'll do the other documentation locations once we're in agreement about
the documentation decisions made in this PR)

## Relevant issue(s)

#2894
#4810
#4070
#1799

## Testing strategy

After the experimentation process that informed the integration test
writing, the testing strategy is the tests themselves.

## Reviewer Checklist
- [ ] Title is accurate.
- [ ] Description motivates each change.
- [ ] No unnecessary changes were introduced in this PR.
- [ ] Avoid breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes unless absolutely necessary.
- [ ] Tests provided or description of manual testing performed is
included in the code or PR.
- [ ] Release note has been added and follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/contributing.html#Release-Note-Guidelines),
or else `changelog/no-changelog` label added.
- [ ] All relevant GitHub issues are correctly linked.
- [ ] Change contains telemetry where appropriate (logs, metrics, etc.).
- [ ] Telemetry is meaningful, actionable and does not have the
potential to leak sensitive data.

---------

Co-authored-by: Munir Abdinur <[email protected]>
Co-authored-by: Brett Langdon <[email protected]>
  • Loading branch information
3 people authored Jan 31, 2023
1 parent 501a686 commit 8a9fec4
Show file tree
Hide file tree
Showing 10 changed files with 247 additions and 75 deletions.
12 changes: 9 additions & 3 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -762,9 +762,15 @@ jobs:
gunicorn:
<<: *machine_executor
steps:
- run_test:
pattern: "gunicorn"
snapshot: true
- attach_workspace:
at: .
- checkout
- start_docker_services:
services: ddagent
- run:
command: |
mv .riot .ddriot
./scripts/ddtest riot -v run --pass-env -s 'gunicorn'
httplib:
<<: *machine_executor
Expand Down
32 changes: 24 additions & 8 deletions ddtrace/contrib/gunicorn/__init__.py
Original file line number Diff line number Diff line change
@@ -1,17 +1,33 @@
"""
``ddtrace`` supports `Gunicorn <https://gunicorn.org>`__.
**Note:** ``ddtrace-run`` and Python 2 are both not supported with `Gunicorn <https://gunicorn.org>`__.
If the application is using the ``gevent`` worker class, ``gevent`` monkey patching must be performed before loading the
``ddtrace`` library.
``ddtrace`` only supports Gunicorn's ``gevent`` worker type when configured as follows:
There are different options to ensure this happens:
- The application is running under Python 3
- `ddtrace-run` is not used
- The `DD_GEVENT_PATCH_ALL=1` environment variable is set
- Gunicorn's ```post_fork`` <https://docs.gunicorn.org/en/stable/settings.html#post-fork>`__ hook does not import from
``ddtrace``
- ``import ddtrace.bootstrap.sitecustomize`` is called either in the application's main process or in the
```post_worker_init`` <https://docs.gunicorn.org/en/stable/settings.html#post-worker-init>`__ hook.
- If using ``ddtrace-run``, set the environment variable ``DD_GEVENT_PATCH_ALL=1``.
.. code-block:: python
- Replace ``ddtrace-run`` by using ``import ddtrace.bootstrap.sitecustomize`` as the first import of the application.
# gunicorn.conf.py
def post_fork(server, worker):
# don't touch ddtrace here
pass
- Use a `post_worker_init <https://docs.gunicorn.org/en/stable/settings.html#post-worker-init>`_
hook to import ``ddtrace.bootstrap.sitecustomize``.
def post_worker_init(worker):
import ddtrace.bootstrap.sitecustomize
workers = 4
worker_class = "gevent"
bind = "8080"
.. code-block:: bash
DD_GEVENT_PATCH_ALL=1 gunicorn --config gunicorn.conf.py path.to.my:app
"""


Expand Down
49 changes: 40 additions & 9 deletions ddtrace/internal/periodic.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,15 +54,46 @@ def stop(self):
self.quit.set()

def _is_proper_class(self):
# DEV: Some frameworks, like e.g. gevent, seem to resuscitate some
# of the threads that were running prior to the fork of the worker
# processes. These threads are normally created via the native API
# and are exposed to the child process as _DummyThreads. We check
# whether the current thread is no longer an instance of the
# original thread class to prevent it from running in the child
# process while the state copied over from the parent is being
# cleaned up. The restarting of the thread is responsibility to the
# registered forksafe hooks.
"""
Picture this: you're running a gunicorn server under ddtrace-run (`ddtrace-run gunicorn...`).
Profiler._profiler() (a PeriodicThread) is running in the main process.
Gunicorn forks the process and sets up the resulting child process as a "gevent worker".
Because of the Profiler's _restart_on_fork hook, inside the child process, the Profiler is stopped, copied, and
started for the purpose of profiling the child process.
In _restart_on_fork, the Profiler being stopped is the one that was running before the fork, ie the one in the
main process. Copying that Profiler takes place in the child process. Inside the child process, we've now got
*one* process-local Profiler running in a thread, and that's the only Profiler running.
...or is it?
As it turns out, gevent has some of its own post-fork logic that complicates things. All of the above is
accurate, except for the bit about the child process' Profiler being the only one alive. In fact, gunicorn's
"gevent worker" notices that there was a Profiler (PeriodicThread) running before the fork, and attempts to
bring it back to life after the fork.
Outside of ddtrace-run, the only apparent problem this causes is duplicated work and decreased performance.
Under ddtrace-run, because it triggers an additional fork to which gevent's post-fork logic responds, the
thread ends up being restarted twice in the child process. This means that there are a bunch of instances of
the thread running simultaneously:
the "correct" one started by ddtrace's _restart_on_fork
the copy of the pre-fork one restarted by gevent after the fork done by gunicorn
the copy of the pre-fork one restarted by gevent after the fork done by ddtrace-run
This causes even more problems for PeriodicThread uses like the Profiler that rely on running as singletons per
process.
In these situations where there are many copies of the restarted thread, the copies are conveniently marked as
such by being instances of gevent.threading._DummyThread.
This function _is_proper_class exists as a thread-local release valve that lets these _DummyThreads stop
themselves, because there's no other sane way to join all _DummyThreads from outside of those threads
themselves. Not doing this causes the _DummyThreads to be orphaned and hang, which can degrade performance
and cause crashes in threads that assume they're singletons.
That's why it's hard to write a test for - doing so requires waiting for the threads to die on their own, which
can make tests take a really long time.
"""
return isinstance(threading.current_thread(), self.__class__)

def run(self):
Expand Down
2 changes: 1 addition & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ contacting support.
+--------------------------------------------------+---------------+----------------+
| :ref:`graphql-core <graphql>` | >= 2.0.0 | Yes |
+--------------------------------------------------+---------------+----------------+
| :ref:`gunicorn <gunicorn>` | >= 19.10.0 | No |
| :ref:`gunicorn <gunicorn>` | >= 20.0 | No |
+--------------------------------------------------+---------------+----------------+
| :ref:`httplib` | \* | Yes |
+--------------------------------------------------+---------------+----------------+
Expand Down
4 changes: 4 additions & 0 deletions releasenotes/notes/gunicorn-issue-09308901fa00e76c.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
issues:
- |
gunicorn: ddtrace-run does not work with gunicorn. To instrument a gunicorn application, follow the instructions `here <https://ddtrace.readthedocs.io/en/latest/integrations.html#gunicorn>`_.
7 changes: 1 addition & 6 deletions riotfile.py
Original file line number Diff line number Diff line change
Expand Up @@ -2595,13 +2595,8 @@ def select_pys(min_version=MIN_PYTHON_VERSION, max_version=MAX_PYTHON_VERSION):
Venv(
name="gunicorn",
command="pytest {cmdargs} tests/contrib/gunicorn",
pkgs={"requests": latest},
pkgs={"requests": latest, "gevent": latest},
venvs=[
Venv(
pys="2.7",
# Gunicorn ended Python 2 support after 19.10.0
pkgs={"gunicorn": "==19.10.0"},
),
Venv(
pys=select_pys(min_version="3.5"),
pkgs={"gunicorn": ["==19.10.0", "==20.0.4", latest]},
Expand Down
21 changes: 21 additions & 0 deletions tests/contrib/gunicorn/post_fork.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
import sys

import gevent.monkey

from ddtrace.debugging import DynamicInstrumentation
from ddtrace.internal.remoteconfig import RemoteConfig
from ddtrace.profiling.profiler import Profiler


# take some notes about the relative ordering of thread creation and
# monkeypatching
monkeypatch_happened = gevent.monkey.is_module_patched("threading")

# enabling DI here allows test cases to exercise the code paths that handle
# gevent monkeypatching of running threads
# post_fork is called before gevent.monkey.patch_all()
if sys.version_info < (3, 11):
DynamicInstrumentation.enable()

RemoteConfig._was_enabled_after_gevent_monkeypatch = monkeypatch_happened
Profiler._was_enabled_after_gevent_monkeypatch = monkeypatch_happened
4 changes: 0 additions & 4 deletions tests/contrib/gunicorn/simple_app.py

This file was deleted.

Loading

0 comments on commit 8a9fec4

Please sign in to comment.