Skip to content

Commit

Permalink
fix(profiling): ensure correct thread link [backport #6798 to 1.17] (#…
Browse files Browse the repository at this point in the history
…6816)

Backport of #6798 to 1.17

We use the C API to retrieve the thread ID of the current thread to
create a link between threads and objects. This way we protect ourselves
from frameworks such as gevent that might turn thread IDs into task IDs.

## Testing strategy

Tested with a sample application that the endpoint information is
attached to profiles with this change.

## Checklist

- [x] Change(s) are motivated and described in the PR description.
- [x] Testing strategy is described if automated tests are not included
in the PR.
- [x] Risk is outlined (performance impact, potential for breakage,
maintainability, etc).
- [x] Change is maintainable (easy to change, telemetry, documentation).
- [x] [Library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
are followed. If no release note is required, add label
`changelog/no-changelog`.
- [x] Documentation is included (in-code, generated user docs, [public
corp docs](https://github.com/DataDog/documentation/)).
- [x] Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist

- [x] Title is accurate.
- [x] No unnecessary changes are introduced.
- [x] Description motivates each change.
- [x] Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes unless absolutely necessary.
- [x] Testing strategy adequately addresses listed risk(s).
- [x] Change is maintainable (easy to change, telemetry, documentation).
- [x] Release note makes sense to a user of the library.
- [x] Reviewer has explicitly acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment.
- [x] Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
- [x] If this PR touches code that signs or publishes builds or
packages, or handles credentials of any kind, I've requested a review
from `@DataDog/security-design-and-guidance`.
- [x] This PR doesn't touch any of that.
  • Loading branch information
P403n1x87 authored Sep 1, 2023
1 parent 57fc217 commit 2904bbb
Show file tree
Hide file tree
Showing 3 changed files with 29 additions and 16 deletions.
37 changes: 21 additions & 16 deletions ddtrace/profiling/_threading.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,20 @@ from six.moves import _thread
from ddtrace import _threading as ddtrace_threading


IF UNAME_SYSNAME == "Linux":
from cpython cimport PyLong_FromLong
from cpython cimport PyLong_FromLong


cdef extern from "<Python.h>":
# This one is provided as an opaque struct from Cython's
# cpython/pystate.pxd, but we need to access some of its fields so we
# redefine it here.
ctypedef struct PyThreadState:
unsigned long thread_id

PyThreadState* PyThreadState_Get()


IF UNAME_SYSNAME == "Linux":
from ddtrace.internal.module import ModuleWatchdog
from ddtrace.internal.wrapping import wrap

Expand Down Expand Up @@ -108,28 +119,22 @@ class _ThreadLink(_thread_link_base):
):
# type: (...) -> None
"""Link an object to the current running thread."""
self._thread_id_to_object[_thread.get_ident()] = weakref.ref(obj)
# Because threads might become tasks with some frameworks (e.g. gevent),
# we retrieve the thread ID using the C API instead of the Python API.
self._thread_id_to_object[<object>PyLong_FromLong(PyThreadState_Get().thread_id)] = weakref.ref(obj)

def clear_threads(self,
existing_thread_ids, # type: typing.Set[int]
):
"""Clear the stored list of threads based on the list of existing thread ids.
"""Clean up the thread linking map.
If any thread that is part of this list was stored, its data will be deleted.
We remove all threads that are not in the existing thread IDs.
:param existing_thread_ids: A set of thread ids to keep.
"""
# This code clears the thread/object mapping by clearing a copy and swapping it in an atomic operation This is
# needed to be able to have this whole class lock-free and avoid concurrency issues.
# The fact that it is lock free means we might lose some accuracy, but it's worth the trade-off for speed and simplicity.
new_thread_id_to_object_mapping = self._thread_id_to_object.copy()
# Iterate over a copy of the list of keys since it's mutated during our iteration.
for thread_id in list(new_thread_id_to_object_mapping):
if thread_id not in existing_thread_ids:
del new_thread_id_to_object_mapping[thread_id]

# Swap with the new list
self._thread_id_to_object = new_thread_id_to_object_mapping
self._thread_id_to_object = {
k: v for k, v in self._thread_id_to_object.items() if k in existing_thread_ids
}

def get_object(
self,
Expand Down
2 changes: 2 additions & 0 deletions docs/spelling_wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -250,3 +250,5 @@ programmatically
DES
Blowfish
Gitlab
hotspot
CMake
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
fixes:
- |
profiling: fixed a bug that prevented profiles from being correctly
correlated to traces in gevent-based applications, thus causing code hotspot
and end point data to be missing from the UI.

0 comments on commit 2904bbb

Please sign in to comment.