Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collection of HTTP request metrics shouldn't be on by default #4930

Closed
twz123 opened this issue Dec 17, 2021 · 7 comments
Closed

Collection of HTTP request metrics shouldn't be on by default #4930

twz123 opened this issue Dec 17, 2021 · 7 comments
Labels
bug Something isn't working

Comments

@twz123
Copy link

twz123 commented Dec 17, 2021

Describe the bug
After upgrading from 1.3 to 1.9 of the otel java instrumentation, the metric cardinality exploded for us, bringing down our whole metrics ingestion pipeline. Reason for this was excessive HTTP metrics for each and every path that the application ever encountered, including buckets and everything.

Steps to reproduce
Start a server that accepts arbitrary HTTP requests, fire up a client that issues HTTP requests to arbitrary paths, and see what happens to the number of metrics being collected.

What did you expect to see?
No uncontrolled explosion of metrics.

What did you see instead?
Metric explosion.

What version are you using?
1.9.1

Additional context
This might even be a way to DDoS a service. The metrics being produced rely on user input, which is probably kind of a problem.

@twz123 twz123 added the bug Something isn't working label Dec 17, 2021
@breedx-splk
Copy link
Contributor

Hi @twz123. A couple of clarifying questions:

  • Were you using library instrumentation or auto instrumentation?
  • What HTTP client were you using?

Thanks!

@anuraaga
Copy link
Contributor

anuraaga commented Jan 7, 2022

Hi @twz123 - to clarify, the SDK is not quite an uncontrolled explosion, we currently cap the exported metrics to 2000. There are two problems with this though

  1. 2000 may still be too high
  2. Even if the SDK caps, backends will general still take some time (~days?) to garbage collect on their end, so 2000 different metrics each cycle can easily kill a prometheus

I don't think cardinality limits in the SDK are actually a solution to cardinality issues for users as even if it protects the app it can kill the metrics backend itself easily still. For 1.10 we should go ahead and remove HTTP_TARGET from all metrics and live with it being gone until providing view configuration to javaagent users.

@anuraaga
Copy link
Contributor

anuraaga commented Jan 7, 2022

Also if you are using the collector, you can add metrics transformation there

https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/metricstransformprocessor#description

If you're not using the collector, you may consider it - metrics SDKs are still quite alpha and it will still take time to work out specifications, for example how to allow users to configure attribute collection via non-programmatic means. The collector in the meantime provides essential functionality for using OTel metrics somewhat more stably for production environments.

@twz123
Copy link
Author

twz123 commented Jan 7, 2022

Answering @breedx-splk questions:

Were you using library instrumentation or auto instrumentation?

Just added the otel java agent to the JVM startup parameters. The application itself doesn't have any otel libs in the class path.

What HTTP client were you using?

I guess you mean HTTP server? In this case it's Tomcat. Incoming HTTP requests were from the internet. Mostly web browsers.

@anuraaga Thanks for the clarification about the metrics cap and the status about the metrics SDK. We disabled otel metrics for the time being.

@breedx-splk
Copy link
Contributor

@twz123 Ok thanks, so it's auto-instrumentation with tomcat server instrumentation.

@trask
Copy link
Member

trask commented Jan 7, 2022

I created an issue #5047 to decide on removing http.target before the 1.10.0 release.

@mateuszrzeszutek
Copy link
Member

The http.route attribute in metrics was implemented in #5266 - you should not experience the cardinality explosion anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants