Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urlconnection plugin: HttpURLConnection exceptions are captured also handled by the application. #2976

Closed
markush81 opened this issue Jan 18, 2023 · 11 comments · Fixed by #3024
Labels
8.9-candidate agent-java community Issues and PRs created by the community

Comments

@markush81
Copy link
Contributor

markush81 commented Jan 18, 2023

Describe the bug

Also the application handles all exceptions, especially expected ones the agent is reporting them to APM.

Steps to reproduce

Simple Spring Boot Web application

@RequestMapping(method = RequestMethod.GET, path = "/ping")
public ResponseEntity<Object> ping() throws IOException {
    HttpURLConnection conn = (HttpURLConnection) new URL("https://google.de/404").openConnection();
    conn.setInstanceFollowRedirects(true);
    conn.setRequestMethod("GET");
    int code = conn.getResponseCode();
    System.out.println("Response Code: " + code);
    System.out.println("I reached the end without an exception!");
}

(Using Servlet with RequestMapping is just used to trigger the code inside a Spring Boot application).

gives following console output

Response Code: 404
I reached the end without an exception!

But the agent runs into https://github.com/elastic/apm-agent-java/blob/main/apm-agent-plugins/apm-urlconnection-plugin/src/main/java/co/elastic/apm/agent/urlconnection/HttpUrlConnectionInstrumentation.java#L132 and reports the exception to APM

Screenshot 2023-01-18 at 20 22 00


Different example using an url which response with 401, results into an java.io.IOException to be reported to APM, also

Response Code: 401
I reached the end without an exception!

Screenshot 2023-01-18 at 20 37 26

Expected behavior

Since everything runs as expected and exceptions are handled, there shouldn't be any exception reported towards APM (which as follow up results in an alert, if someone is creating alerts based on occurring exceptions)

Debug logs

Log for the 404 example.

reproducer.log

Other remarks

We have observed same behaviour with AWS SDK S3 instrumentation, also there exceptions are captured also handled later on. Which means we have detected errors in APM, also the code handled and just works fine.

@github-actions github-actions bot added agent-java community Issues and PRs created by the community triage labels Jan 18, 2023
@markush81
Copy link
Contributor Author

markush81 commented Jan 19, 2023

I tried same example with OTEL agent, there it shows the span itself as failure, but does not capture it as error, meaning http://localhost:5601/app/apm/services/App/errors being empty.

Screenshot 2023-01-19 at 07 33 14

So this describes expected behaviour best i guess.

And i guess the "magic" lies in here https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation/http-url-connection/javaagent/src/main/java/io/opentelemetry/javaagent/instrumentation/httpurlconnection/HttpUrlConnectionInstrumentation.java#L126

     if (throwable != null) {
          if (responseCode >= 400) {
            // HttpURLConnection unnecessarily throws exception on error response.
            // None of the other http clients do this, so not recording the exception on the span
            // to be consistent with the telemetry for other http clients.
            instrumenter().end(httpUrlState.context, connection, responseCode, null);
          }

Sidenote: due to elastic/apm-data#12 we can't really switch to OTEL.

@jackshirazi
Copy link
Contributor

This is an enhancement request to get similar behaviour to otel agent. We'd need to add an option for this to maintain backward compatibility and not break existing behaviour, but it seems not a lot of work. I've raised it for consideration to include in upcoming work

@markush81
Copy link
Contributor Author

markush81 commented Jan 19, 2023

@jackshirazi is it really enhancement only? Cause the situation, that an exception which is handled being still captured as an error to APM sounds not right, or does it? Maybe i am missing some point here.

And i fear there could be more such situations in other plugins (as mentioned we phase one in aws-sdk as well) if the underlying implementation is throwing at a lower level and handling at an upper one and the instrumentation is in-between.

But thanks for considering!

@jackshirazi
Copy link
Contributor

The way I see it, if an implementation is throwing an exception, that's an error worth capturing. If there should be no error, there should be no exception. The dev can choose a different implementation - or register a bug with the 3rd party if the exception is incorrectly thrown. To have the APM decide one exception is an error but another isn't, is not generally possible.

As for exceptions that are handled, the agent can only really instrument the implementation and not the code using that implementation. Going up the stack above the implementation to see what happens to the exception would be a huge overhead, and even harder, where do we decide to stop? All exceptions are handled at some point (or the application terminates). The OTel instrumentation you pointed at doesn't look to see if you've handled the exception, it's just an opinionated view that these exceptions are not worthy of being reported because other clients don't throw exceptions in that state. I can guarantee more than one application is doing the opposite and explicitly throwing an exception for 4xx/5xx codes where the http client doesn't.

Finally, it's generally not recommended to use exceptions as a normal code path instead of an exceptional code path. So it's better to see when these exceptions are being generated. This is particularly the case for "shift-left" moves when you want to see what exceptions - including the ones that the application is handling - are generated so that you can decide how to make the application more resilient not just by handling exceptions but more importantly by changing things so that fewer are generated.

I'm not stuck on this being an enhancement, but I don't see anything convincing me it's a bug

@markush81
Copy link
Contributor Author

markush81 commented Jan 19, 2023

If there should be no error, there should be no exception.

Going up the stack above the implementation to see what happens to the exception would be a huge overhead, and even harder, where do we decide to stop?

Yep, i totally agree to these.

The dev can choose a different implementation

In that case it is hard, because so many libraries rely on java.net.HttpURLConnection (our actual case we use a salesforce client lib, which uses java.net.HttpURLConnection).

With the HttpURLConnection the problem is: It is part of JDK, so not really sth. we can solve easily.

The catch is already done inside JDK, that's why the called getResponseCode works just fine.

  1. Exception is generated here https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/master/src/java.base/share/classes/sun/net/www/protocol/http/HttpURLConnection.java#L1909 which is basically the call to getInputStream
  2. catched here https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/master/src/java.base/share/classes/java/net/HttpURLConnection.java#L529 the called method getResponseCode
  3. and only in case of no statusLine rethrown https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/master/src/java.base/share/classes/java/net/HttpURLConnection.java#L537
  4. otherwise the method just returns the responseCode as expected https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/master/src/java.base/share/classes/java/net/HttpURLConnection.java#L555
  5. and if i get it right the instrumentation observes getInputStream which has thrown

I guess/think that, that is the reason why OTel choose the way of just ignoring it.

Anyway, thanks for putting it as a candidate, appreciate that.

@jackshirazi
Copy link
Contributor

If you really want a quick fix, you can instrument HttpURLConnection yourself, our community plugin makes that very easy to plugin to the agent (examples)

@markush81
Copy link
Contributor Author

Thanks for the hint, i'll have a look at that.

@JonasKunz
Copy link
Contributor

The catch is already done inside JDK, that's why the called getResponseCode works just fine.

To me it would feel most natural to only collect exceptions as error-documents if the exception makes it to the user and is not catched within the library we instrument. (HttpUrlConnection in this case).

So because in this case the exception is an implementation detail of how getResponseCode is implemented, I would agree with @markush81 that these exceptions should not be collected as error documents.

If users want statistics about outoing request success / failure, they should rather look at the outcome field instead of the error-documents anyway, which is consistent across the HTTP clients and not affected by this change.

@eyalkoren
Copy link
Contributor

Extending on what @JonasKunz says above, we decided that the best approach would be to leave the current instrumentation for getInputStream() as is, but add an additional instrumentation for getResponseCode().

We think this may be a better approach because HttpUrlConnection users can invoke getInputStream() directly, in which case the exception will get thrown to the user, in which case we want to reflect that as an error event.
On the other hand, if we instrument getResponseCode(), then the internal call to getInputStream() would be ignored, as the agent will see that it is nested within an active HTTP client span, which means that if the user is calling getResponseCode() and not getting an exception, we will not send an error event.

Does this sound reasonable?

@markush81
Copy link
Contributor Author

markush81 commented Feb 20, 2023

Extending on what @JonasKunz says above, we decided that the best approach would be to leave the current instrumentation for getInputStream() as is, but add an additional instrumentation for getResponseCode().

We think this may be a better approach because HttpUrlConnection users can invoke getInputStream() directly, in which case the exception will get thrown to the user, in which case we want to reflect that as an error event. On the other hand, if we instrument getResponseCode(), then the internal call to getInputStream() would be ignored, as the agent will see that it is nested within an active HTTP client span, which means that if the user is calling getResponseCode() and not getting an exception, we will not send an error event.

Does this sound reasonable?

@markush81 thanks for taking a shot on this! 🙏
Please see my last comment on the original issue. It explains how we think it would be best to approach this.
If you want to implement this yourself, we will be happy to assist and review. Otherwise, if you don't mind and agree that this approach is preferable, we can implement that.
Please let us know.

@eyalkoren actually when i did this PR, i was really thinking similar thing "what if someone calls getInputStream directly, doesn't he want to see the exception?", but then was looking again at OTel implementation, it seems no one complained so far.

But i totally get the point. I haven't checked so far, if getResponseCode is the only case, maybe there are at the end more public methods to cover.

I can checkout during next weekend if i can provide an implementation based on what you described.
If i can't make it (time-wise or code-wise), i'll let you know asap.

@markush81
Copy link
Contributor Author

markush81 commented Feb 23, 2023

@eyalkoren actually when i did this PR, i was really thinking similar thing "what if someone calls getInputStream directly, doesn't he want to see the exception?", but then was looking again at OTel implementation, it seems no one complained so far.

But i totally get the point. I haven't checked so far, if getResponseCode is the only case, maybe there are at the end more public methods to cover.

I can checkout during next weekend if i can provide an implementation based on what you described. If i can't make it (time-wise or code-wise), i'll let you know asap.

@eyalkoren So it seems that getResponseCode is only affected case within HttpURLConnection, i updated the PR. Let me know in case sth. should be different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.9-candidate agent-java community Issues and PRs created by the community
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants