-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Http/sURLConnection auto instrumentation #133
Http/sURLConnection auto instrumentation #133
Conversation
...n/java/io/opentelemetry/instrumentation/agent/httpurlconnection/HttpUrlConnectionPlugin.java
Outdated
Show resolved
Hide resolved
...ain/java/io/opentelemetry/instrumentation/library/httpurlconnection/HttpUrlReplacements.java
Outdated
Show resolved
Hide resolved
Thanks @surbhiia I think it's a good start, although I noticed that there are some comments and todos in the wrapper class that seem like we should discuss further, probably in a call if needed. I've also created this issue to prevent API level restrictions on this module. I think that if we can verify that no runtime issues would happen when the wrapper class is loaded in envs where some methods aren't available, then it should be fine to ignore API restrictions for these cases. |
Thanks @LikeTheSalad! I added those 2 APIs in the wrapper class and instrumented the sample app with that and ran the instrumented app on an API level 21 image and it worked perfectly! No issues at runtime! |
Oops, mistakenly closed, re-opening again! |
Thanks @surbhiia for the contribution!
|
Thanks @breedx-splk! I've added a readme now! Do let me know if anything looks amiss. Thanks for creating the issue. Do assign it to me. I'll add the tests in next PR. :) |
…ed, Add README, remove scope
15504e8
to
fa389ed
Compare
@surbhiia @LikeTheSalad We chatted about some edge cases, specifically related to use cases where the client just pumps data (like a POST) and doesn't bother to read the response. I have put together a test gist that does exactly this: https://gist.github.com/breedx-splk/246962f7d1c180f924a530630a966c9f In my testing with that, I run the collector locally and then I used netcat (
This indicates to me that the upstream instrumentation is behaving correctly -- it doesn't create a span because the request/response has not taken place. I think the If you add With that in mind, I definitely don't think it's something we should worry about covering this early in the instrumentation development. |
😎 @surbhiia You have to comment on the issue before I can assign. Just a simple "assign me please" will do. 👍🏻 |
@breedx-splk Thanks for testing this! I created this issue about it in that repo yesterday and wanted to test this. Other than getInputStream/getErrorStream, getResponseCode and getResponseMessage are other two methods generally always called. Java instrumentation is closing spans for the latter two as well, as those methods inherently call getInputStream. We're instrumenting the call sites itself and we've not ended spans for the latter two right now. We can perhaps do that but that isn't a good idea as then we will close early in many cases and miss out on any exceptions occurring after that as usually first getResponseCode is checked and based on that either getInputStream/getErrorStream is called and we want to capture any IOExceptions there. Sort of a trade-off between waiting to close in order to capture any possible exceptions vs closing early in order to have a span ended for more possible scenarios. I still think we should go with the former as we can have the correct telemetry reported. Right now in the readme we've mentioned scheduling of the harvester as a required configuration. We can perhaps make it optional and call out how it affects the behavior. And in future we can figure out how to schedule that too automatically if possible :) |
is this any worse than the Java agent instrumentation? (I'm guessing we're probably closing the span early in these cases too?) |
@trask No, we're not closing spans early, instead waiting till the end and we're scheduling a thread to periodically look for idle connections and close spans. |
...io/opentelemetry/instrumentation/library/httpurlconnection/HttpUrlInstrumentationConfig.java
Show resolved
Hide resolved
static { | ||
activeURLConnections = new ConcurrentHashMap<>(); | ||
logger = Logger.getLogger("HttpUrlReplacements"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please just move these to the corresponding declaration lines.
...ain/java/io/opentelemetry/instrumentation/library/httpurlconnection/HttpUrlReplacements.java
Outdated
Show resolved
Hide resolved
...ain/java/io/opentelemetry/instrumentation/library/httpurlconnection/HttpUrlReplacements.java
Outdated
Show resolved
Hide resolved
...elemetry/instrumentation/library/httpurlconnection/internal/HttpUrlConnectionSingletons.java
Outdated
Show resolved
Hide resolved
...n/java/io/opentelemetry/instrumentation/agent/httpurlconnection/HttpUrlConnectionPlugin.java
Show resolved
Hide resolved
Thanks again @surbhiia for circling back on this. I think it's looking like a great start. I had a few comments, but I think it's close. |
…ntelemetry/instrumentation/library/httpurlconnection/internal/HttpUrlConnectionSingletons.java Update instrumentation name Co-authored-by: jason plumb <[email protected]>
Thanks @breedx-splk for such a detailed review! I've addressed your comments now. Do take a look and let me know if anything seems missing. :) |
@open-telemetry/android-approvers Please have a look! |
...io/opentelemetry/instrumentation/library/httpurlconnection/HttpUrlInstrumentationConfig.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there tests forthcoming?
|
||
> The minimum supported Android SDK version is 21, though it will also instrument APIs added in the Android SDK version 24 when running on devices with API level 24 and above. | ||
|
||
> If your project's minSdk is lower than 26, then you must enable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Desugaring is necessary for minSdk 24 now because of the dependency on the semconv module. You may just want to point to the reference in main README instead instead of putting this and the AGP requirements here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did think of that, but I was/am more inclined to keep this here as this module can be used separately to the main module. Customers do not need to navigate to and read the other readme.
If not, maybe we can pull out the similar text mentioned in main readme to a separate section in main readme itself, something like "Critical Notes"(there is a slight more addition to that we can make - for AGP < 8.3.0, the property to be used for correct dexing is different). And link to that section from this readme. But then, all the incremental callouts that would go in that section might not apply to this module always.
Which one sounds better to you? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a similar duplication in the Java sem conv repo earlier but was advised to put the details in one place so if we need to update, we didn't have to do it multiple places. But I think both ways have advantages, so it's up to you which one you prefer. I think merging as is and figuring it out later is perfectly reasonable.
|
||
// Time (ms) to wait before assuming that an idle connection is no longer | ||
// in use and should be reported. | ||
private static final long CONNECTION_INACTIVITY_TIMEOUT = 10000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good to include the unit in the constant name to make it clear, especially since OTel is mostly in nanoseconds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, 10s is actually not a very long period for mobile apps - requests routinely exceed this. If this value is looking at the time between the connection was first attempted to when it's checked, this will timeout legit requests that are still going.
Do we want to up this value substantially, or perhaps do this timeout in a different way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, 10s is the time for which connection has been found to be idle after becoming harvestable (last time it successfully read from the server). Every time any successful read API (getContent etc or even the inputStream individual read APIs) is triggered the last seen time is updated. So, the idea was - only those connections that get stuck in between - not reading anything from server for 10s, no throwable, not disconnected - would be picked up after 10s of being idle.
But I see your point, there could be cases when there could be longer than 10s delay between two read APIs (specially the consecutive inputStream.read calls ) due to various reasons.
I did plan on discussing about making it configurable, so consumers can set it closer to their connection_timeout setting or based on their application expected behavior. Will bring it up in Android sig meeting, we can perhaps add that in next PRs along with making the span name extractor function configurable.
We can still keep 10s as the default or increase it if you think it's very less. The downside of going with default for the consumer right now, would be that they will miss capturing any throwable that would have occurred from consecutive reads from the connection, as the connection span will get reported as successful by the harvester after 10s of being idle. There's always option for consumer to not schedule this thread if they have good implementation where they finally call disconnect on the connection. That will always make sure the span gets reported. The only span that will not be closed for them then would be for cases when the connection is legit stuck. If they're handling such scenarios too in their application code nicely, by disconnecting, they don't have to worry about the harvester. For scenarios, when they're not able to disconnect a stuck connection, they will need to schedule harvester with a timeout of their choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per offline discussion, aligning this with the default would be nice, and making it configurable would be even nicer. That said, it's not blocking, so feel free to take care of this as you please.
Hi @surbhiia - sorry for taking so long for adding my comments. My main question is regarding the timeout value - 10 seconds doesn't seem sufficient if it's for the duration of the entire request-response cycle for an Android app. What would you think about a higher value, or maybe make it configurable? |
Yes! I have the test module ready already in my local in the same branch. As soon as this PR is merged, will put up another PR for that. |
No problem at all! I'm glad it's getting such good reviews. :) I've answered your question about timeout in previous comment. We can continue the discussion further in our Android Sig this week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Comments added for potential future work, but nothing blocking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
What's covered in this PR
As the Android SDK classes can't be instrumented with ByteBuddy Android Gradle Plugin, this PR instruments all calls to relevant URLConnection/HttpURLConnection/HttpsURLConnection APIs in android applications source code.
APIs Instrumented:
APIs that connect but do not read bytes:
APIs that connect and read bytes:
Approach Taken
Harvestable meaning here - if harvestable and if not already reported and if left idle for more than CONNECTION_KEEP_ALIVE_INTERVAL ms, connection is reported by a connection killer thread. So, once connection is made and any data is read from the connection, we mark it as harvestable because we don't anticipate connection to be sitting idle for longer after this.
As noted above, We keep track of all ongoing URLConnections and three other properties for it - last seen time, harvestable (boolean), reported (boolean)
When are spans closed and reported:
Sample App to test this PR
Added an android sample app with many test scenarios for this PR here.
If you would like to test this PR using the above sample app:
In sample app: