Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There should be a way to disable OpenCensus stats and tracing #5510

Closed
njhill opened this issue Mar 28, 2019 · 13 comments
Closed

There should be a way to disable OpenCensus stats and tracing #5510

njhill opened this issue Mar 28, 2019 · 13 comments
Assignees
Milestone

Comments

@njhill
Copy link
Contributor

njhill commented Mar 28, 2019

Currently, out-of-the-box client and server config includes a very expensive "no-op" in the form of the enabled-by-default opencensus tracing and stats interceptors. This is the case even if no implementation library is on the classpath, and disabling them is difficult/discouraged/impossible (without reflection) depending on the transport.

Benchmark                         (direct)  (statsAndTracing)  (transport)  Mode  Cnt      Score      Error  Units
TransportBenchmark.unaryCall1024      true               true    INPROCESS  avgt   10   3313.595 ±  107.360  ns/op
TransportBenchmark.unaryCall1024      true              false    INPROCESS  avgt   10   2142.610 ±   99.678  ns/op
TransportBenchmark.unaryCall1024      true               true  NETTY_LOCAL  avgt   10  64780.414 ± 1564.422  ns/op
TransportBenchmark.unaryCall1024      true              false  NETTY_LOCAL  avgt   10  60969.867 ± 1447.113  ns/op
TransportBenchmark.unaryCall1024      true               true        NETTY  avgt   10  85561.145 ± 3765.635  ns/op
TransportBenchmark.unaryCall1024      true              false        NETTY  avgt   10  81556.086 ± 2136.174  ns/op
TransportBenchmark.unaryCall1024      true               true  NETTY_EPOLL  avgt   10  79758.331 ± 3005.046  ns/op
TransportBenchmark.unaryCall1024      true              false  NETTY_EPOLL  avgt   10  75667.486 ± 2734.508  ns/op
TransportBenchmark.unaryCall1024      true               true       OKHTTP  avgt   10  74714.350 ± 2848.833  ns/op
TransportBenchmark.unaryCall1024      true              false       OKHTTP  avgt   10  70527.535 ± 2945.292  ns/op

There's also significantly more garbage produced when enabled.

It would be great to enable these only when the impl library is detected, however:

  • There are circumstances when it might be there but not used
  • It may be the case that only one of stats or tracing is being used
  • Even if it is being used, instrumentation overhead of this magnitude may be considered undesirable (and these numbers are just the api plumbing without any additional cost from the impl itself)

Could these be changed to be disabled by default? I'm unsure why such settings are considered internal when they pertain to external functionality. There's not much danger of anyone "accidentally" leaving it disabled since it would be immediately apparent that the stats aren't working but the converse is of course true and I would assume most common.

@dapengzhang0
Copy link
Member

There is no public API to disable it because there was no strong demand from users. However, there is a hack: enabling retry will disable census as a side effect for the current version of grpc library (Note: This behavior is subject to change in future releases).

@njhill
Copy link
Contributor Author

njhill commented Mar 28, 2019

Thanks @dapengzhang0

There is no public API to disable it because there was no strong demand from users.

Strong demand for the associated functionality or strong demand to disable these flags? If the former wouldn't that be all the more reason for them to be off by default? and if the latter I expect that's because most aren't aware of them or the fact a free performance boost is available by switching them off :)

there is a hack: enabling retry will disable census as a side effect for the current version of grpc library

This only applies to client-side, there's no way to do the equivalent on the server side. Interestingly if you also don't want retry then channelBuilder.enableRetry().disableRetry() will work too :) again only for client side though. If you are lucky enough to be using netty there are the InternalNetty(Channel|Server)Builder classes, but these also come with similar warning/discouragement.

@zhangkun83 zhangkun83 changed the title Stats and tracing There should be a way to disable OpenCensus stats and tracing Mar 28, 2019
@zhangkun83
Copy link
Contributor

As we revisit the decision to include the OpenCensus into grpc-core, we regret it. That's why we don't expose public API to disable it. We are considering moving that of grpc-core and make its own artifact grpc-census. There may be two ways to approach it:

  1. Let the core find classes of grpc-census and load then if available. The user controls whether to have stats/tracing by having or not having grpc-census in their class path.
  2. Ditch dynamic loading, and have a utility class in grpc-census to install interceptors/tracers on the channel/server builders.

Option 1 allows current stats/tracing-dependent users to migrate without any code change. Option 2 is cleaner as it gives the user the maximum control, while it requires code changes to migrate. I personally prefer Option 2 but I am not sure whether the churn brought to users is acceptable. @ejona86 may want to chime in.

@njhill
Copy link
Contributor Author

njhill commented Mar 29, 2019

Thanks @zhangkun83. My vote would be for both! i.e. have the default settings determined by presence of the grpc-census classes, but also provide a means of explicit/finer-grained control. Wouldn't that give the best of both worlds? i.e. existing census users could upgrade without requiring any code/config changes while those who don't use it would benefit from it being turned off automatically.

Maybe a third option to consider at some point (also not mutually exclusive) is investigating the reason for the overhead; maybe there's some low-hanging optimizations which could cut that down.

@dapengzhang0
Copy link
Member

@zhangkun83 I think Option 1 might require current stats/tracing-dependent users to migrate with dependency change.

@sanjaypujare
Copy link
Contributor

I agree with @dapengzhang0 - option 1 is a breaking change for users using stats/tracing. So option 2 is better - except have the utility class in grpc-core instead of grpc-census (if possible).

carl-mastrangelo pushed a commit that referenced this issue Mar 29, 2019
The `ThreadlessExecutor` currently used for blocking calls uses `LinkedBlockingQueue` which is relatively heavy both in terms of allocations and synchronization overhead (e.g. when compared to `ConcurrentLinkedQueue`). It accounts for ~10% of allocations and ~5% of allocated bytes per-call in the `TransportBenchmark` when using in-process transport with [stats and tracing disabled](#5510).

Changing to use a `ConcurrentLinkedQueue` results in a ~5% speedup of that benchmark.

Before:
```
Benchmark                         (direct)  (transport)  Mode  Cnt      Score     Error  Units
TransportBenchmark.unaryCall1024      true    INPROCESS  avgt   60   1877.339 ±  46.309  ns/op
TransportBenchmark.unaryCall1024     false    INPROCESS  avgt   60  12680.525 ± 208.684  ns/op
```

After:
```
Benchmark                         (direct)  (transport)  Mode  Cnt      Score     Error  Units
TransportBenchmark.unaryCall1024      true    INPROCESS  avgt   60   1779.188 ±  36.769  ns/op
TransportBenchmark.unaryCall1024     false    INPROCESS  avgt   60  12532.470 ± 238.271  ns/op
```
@njhill
Copy link
Contributor Author

njhill commented Mar 29, 2019

@dapengzhang0 @sanjaypujare I don't follow why 1 can't be done without breaking existing users?

  static boolean classFound(String name) {
    try {
      Class.forName(name, /* init = */ false, StatsComponent.class.getClassLoader());
      return true;
    } catch (ClassNotFoundException e) { }
    return false;
  }
  
  boolean enableStats = classFound("io.opencensus.impl.stats.StatsComponentImpl")
      || classFound("io.opencensus.impllite.stats.StatsComponentImplLite");
  
  boolean enableTracing = classFound("io.opencensus.impl.trace.TraceComponentImpl")
      || classFound("io.opencensus.impllite.trace.TraceComponentImplLite");

@dapengzhang0
Copy link
Member

@njhill Option 1 will be depending on a new artifact grpc-census for existing census users. I agree that your solution is not breaking existing users, but it seems different from Option 1.

@sanjaypujare
Copy link
Contributor

@njhill So people who do want to use stats/tracing will need to add the new dependency grpc-census in their build to maintain existing functionality. Without that change it breaks, isn't that right?

@njhill
Copy link
Contributor Author

njhill commented Mar 29, 2019

@dapengzhang0 @sanjaypujare ok sorry I think I didn't properly digest @zhangkun83's comment which is specifically talking about how to proceed with splitting the opencensus-specific logic into a separate module.

I was actually more interested in something which could hopefully be a simpler and non-breaking first step - to control whether the interceptors are enabled based on presence of the opencensus impl classes themselves.

njhill added a commit to njhill/grpc-java that referenced this issue Mar 29, 2019
The interceptors do nothing when the OpenCensus implementation classes aren't loadable, yet still have significant overhead. So this change should not break any existing users of opencensus. See grpc#5510 for example of the performance benefit to other users.

Contributes to grpc#5510.
@njhill
Copy link
Contributor Author

njhill commented Mar 29, 2019

I've opened #5520 for consideration.

@zhangkun83 zhangkun83 added this to the 1.21 milestone Apr 2, 2019
@ejona86 ejona86 modified the milestones: 1.21, v1.22 May 7, 2019
@creamsoup creamsoup modified the milestones: v1.22, v1.23 Jun 18, 2019
@dapengzhang0 dapengzhang0 modified the milestones: 1.23, 1.24 Jul 30, 2019
@voidzcy voidzcy modified the milestones: 1.24, 1.25 Sep 24, 2019
@zhangkun83 zhangkun83 removed this from the 1.25 milestone Oct 23, 2019
@zhangkun83 zhangkun83 added this to the 1.26 milestone Oct 23, 2019
@ejona86 ejona86 assigned dapengzhang0 and unassigned zhangkun83 Dec 17, 2019
@ejona86 ejona86 modified the milestones: 1.26, Next Dec 17, 2019
@njhill
Copy link
Contributor Author

njhill commented Jan 16, 2020

Has this now been addressed by #5510 #6577?

@ejona86
Copy link
Member

ejona86 commented Jan 16, 2020

Yes, I think #6577 (not #5510 😄) addressed this. I don't think all the discussions are done, but the major issue should be addressed now.

Closing. @njhill, if things still don't look right we can reopen.

@ejona86 ejona86 closed this as completed Jan 16, 2020
@ejona86 ejona86 modified the milestones: Next, 1.27 Jan 16, 2020
@lock lock bot locked as resolved and limited conversation to collaborators Apr 15, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants