-
Notifications
You must be signed in to change notification settings - Fork 620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add urllib3 instrumentation #299
Add urllib3 instrumentation #299
Conversation
* use HTTPConnectionPool.urlopen as instrumentation point as it allows to instrument requests made with urllib3's PoolManager, HTTPConnectionPool and HTTPSConnectionPool * use common suppression context variable 'suppress-httpinstrumentation' to avoid double tracing with requests instrumentation * add missing install command for urllib instrumentation when linting with tox
I can confirm this works. Would be awesome if it could be added to the next release. |
|
||
Usage | ||
----- | ||
.. code-block:: python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we are trying to take the SDK implementations out of the Usage
strings like here and only show the instrumentation. For the below code, consider adding an examples folder and putting it there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mariojonke Updates on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the boilerplate TracerProvider setup with the last commit. So it is just the plain usage of the instrumentation now.
@@ -56,7 +56,7 @@ | |||
|
|||
# A key to a context variable to avoid creating duplicate spans when instrumenting | |||
# both, Session.request and Session.send, since Session.request calls into Session.send | |||
_SUPPRESS_REQUESTS_INSTRUMENTATION_KEY = "suppress_requests_instrumentation" | |||
_SUPPRESS_HTTP_INSTRUMENTATION_KEY = "suppress_http_instrumentation" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious as to why we are merging these into one context variable now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requests
uses urllib3
internally. If the instrumentations for both libraries are installed two spans equal spans would be created. So the common suppression key is to avoid this double tracing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applies to urllib
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
urllib
is actually separate but all three libraries internally use the http.client
package. So in case there is a http.client
instrumentation in the future i thought it would make sense to have a common suppression key for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean if today, someone does not want to track requests
but wants to track urllib
, they wouldn't be able to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the supression variables were only meant to avoid double tracing and not for users to supress spans for one specific library. Like for the requests
library where a call to Session.request
would create another span because it calls into Session.send
which is also instrumented.
To generally avoid tracing for example the requests
library one would just not install the instrumentation or simply call RequestsInsrumentor().uninstrument()
. I doubt anyone would use the Context
for that since the supression variable is then required to be in every context instance to work reliably.
To supress spans for a single call/request using the Context
probably makes sense but for the requests
, urllib3
example (not using urllib
here since it does not share a code path with requests
) it wouldn't make much of a difference since in the end both libraries are producing the same kind of span (with even the same attributes).
If we really want to support this i guess we would need 3 supression variables: one to just supress the library (e.g. requests), one to supress the traced aspect (e.g. http) and one to supress tracing alltogether. One could then just set the supression variables for the libraries they do not want to track.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with the current changes since the supression variables were meant to only be used internally.
@@ -154,7 +154,7 @@ def _instrumented_open_call( | |||
inject(type(headers).__setitem__, headers) | |||
|
|||
token = context.attach( | |||
context.set_value(_SUPPRESS_URLLIB_INSTRUMENTATION_KEY, True) | |||
context.set_value(_SUPPRESS_HTTP_INSTRUMENTATION_KEY, True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean if urllib instrumentation is used then http.client instrumentation (if present) will be suppressed? I don't think any instrumentation should dictate if another instrumentation should be silenced or not. What if a user actually wants both? May be there are issues with http.client or with urllib and users would want to detect exactly which or may be http.client has adds lower level information like DNS, connection, SSL-handshake, etc related info. Should urllib really automatically suppress it?
I think it should be left to the user whether to any specific instrumentation should be enabled or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm.., i guess it depends on what one actually expects to see, the span produced by the instrumentation or the outgoing HTTP call.
The span produced by an instrumentations like e.g. for HTTP clients is usually the last in process span before the trace is continued in some other process (on probably some other host). Hence, the inject
call to propagate the context to the next process. Without supressing the lower level HTTP instrumentations, each would separatly inject their span into the HTTP headers. From a functionality point of view this is probably not a problem since traceparent and tracecontext are going to ge overwritten (as far is a saw) but will introduce additional performance overhead. Not sure if a hierarchy of subsequent spans of kind CLIENT
would cause problems though.
Looking at it from a pure span point of view it raises the question i guess why some instrumentations suppress library internal calls in the first place, e.g. the call from Session.request
to Session.send
in the requests
library (and i think urllib instrumentor is doing something similar). The produced spans are more or less the same but would still provide additional info through timing and span hierarchy. For a user of the library this is probably of less to no interest but i guess the same can be said about spans produced by the libraries which are internally used by the instrumented library. E.g. requests
and urllib3
produce the same span with just the InstrumentationInfo
being different.
If a user is only interested in the span for the outgoing HTTP (without the spans produced by the internal libraries) i think it is not always possible to achieve this by deciding if an instrumentation is enabled or not. E.g. in an application which uses a mix of requests
and urllib3
for communication. With an adapter requests
can be configured to use a protocol other than HTTP for certain sessions. If one now wants to track the HTTP calls made with requests
, urllib3
plus the calls with requests
using other protocol, HTTP calls made with requests
would always generate two outgoing HTTP spans (requests + urllib3) without the possibility to deactivate this behavior.
Imo from a users point of view the outgoing HTTP call might of more interest than the internal workings of the repquests
, urllib
, ... libraries but i can change the PR to not suppress other instrumentations if you want .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Internal library spans are a completely difference use case. If they are not silenced then every span exported/generated will generate more spans which will in-turn generate more spans. So even generating a single span will start a loop of spans being generated. So we have to silence the internal library spans.
However, I'm convinced by the argument that not silencing will result in a parent > child pair of spans that will both be of kind CLIENT. I still don't think the solution should be for one instrumentation to silence other instrumentations but I don't see a simpler solution to solve this without adding some advanced features.
So it's fine if urllib3 or requests silences http instrumentation in this PR. I'll create an issue to discuss this further and possibly improve it in future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK we can probably solve in a simple enough way.
What if urllib3 and requests never injected tracing context headers and always set span kind to INTERNAL
and then added http instrumentation as a dependency.
So whenever either urllib3 or requests is installed, http.client instrumentation is installed as well. If requests instrumentation is smart enough to never set kind to CLIENT and never inject tracing context when HTTP transport is used, this should cleanly solve the entire problem.
In this case, we'd replace the current suppression calls with a comment that says if we ever build a http.client instrumentation, we should update urllib3 and requests to not generate client spans.
Only issue I see with this is that these libs will hard depend on our http.client instrumentation so it'll be a bit weird (install other package, set env var to disable our http.client instrument) for users to swap out the http.client instrumentation with their own but I think this would be a very rare edge-case with a very viable solution.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened an issue to discuss this further: #369
I'm OK with this PR to suppress http.client instrumentation in the meanwhile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@owais
You have a changes requested for this PR.
I'll resolve the conflicts after open-telemetry/opentelemetry-python#1690 is merged as otherwise there is the potential of breaking the build due to the instrumentation still using the old API. |
Resolved conflicts and updated dependencies to release |
Description
Adds instrumentation support for the urllib3 library.
Type of change
How Has This Been Tested?
Additional unit tests were added.
Does This PR Require a Core Repo Change?
Checklist:
See contributing.md for styleguide, changelog guidelines, and more.