-
Notifications
You must be signed in to change notification settings - Fork 850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autoconfigure experimental OTLP retry #3791
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3791 +/- ##
============================================
+ Coverage 89.28% 89.30% +0.01%
- Complexity 3956 4017 +61
============================================
Files 473 476 +3
Lines 12307 12458 +151
Branches 1207 1217 +10
============================================
+ Hits 10988 11125 +137
- Misses 908 920 +12
- Partials 411 413 +2
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach of avoiding public API with reflection seems quite reasonable to me
...ommon/src/test/java/io/opentelemetry/exporter/otlp/internal/grpc/ManagedChannelUtilTest.java
Outdated
Show resolved
Hide resolved
@@ -19,8 +19,7 @@ testSets { | |||
|
|||
dependencies { | |||
api(project(":sdk:logs")) | |||
|
|||
implementation(project(":exporters:otlp:common")) | |||
api(project(":exporters:otlp:common")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change needed? common only has internal code, so we'd expect an implementation dependency to always suffice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is.
Code in :sdk-extensions:autoconfigure
needs to access internal code of :exporters:otlp:common
(i.e. the RetryPolicy
class). The :sdk-extensions:autoconfigure
module has only a compileOnly dependency on :exporters:otlp:all
(trace) and :exporters:otlp:metrics
, since it requires users to add the runtime dependency. But with a runtime dependency on :exporters:otlp:all
and :exporters:otlp:metrics
, they also now need a runtime dependency on :exporters:otlp:common
in order to have access to RetryPolicy
. Making :exporters:otlp:common
an api
dependency makes it work without the additional dependency.
If this eventually matures, I think :exporters:otlp:common
would be the natural home for RetryPolicy
, so at that point a api
gradle dependency on :exporters:otlp:common
would make sense.
...ommon/src/test/java/io/opentelemetry/exporter/otlp/internal/grpc/ManagedChannelUtilTest.java
Outdated
Show resolved
Hide resolved
@jkwatson Want to review this? Otherwise we can probably go ahead and merge |
Should be fine from my perspective. I can give it a thorough review tomorrow if you think it needs it |
* @throws IllegalArgumentException if the instance does not contain a field called "delegate" of | ||
* type {@link DefaultGrpcExporterBuilder} | ||
*/ | ||
public static <T> DefaultGrpcExporterBuilder<?> getDelegateBuilder(Class<T> type, T instance) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we scope T
to at least have some bounds? Do we have a common interface we could enforce here to make it a bit clearer what this is for/about? This is a weird API to have be public at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we scope T to at least have some bounds? Do we have a common interface we could enforce here to make it a bit clearer what this is for/about?
Unfortunately not.
This is a weird API to have be public at the moment.
Yeah its not ideal, but at least its an internal package. There's tons of public stuff in the internal packages that is questionable if you view it as truly public.
The alternative to this was to have this functionality package private in the autoconfigure module. I like this approach though because it allows someone to programmatically configure a retry policy that is different from the default one that is used when you enable it via env vars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somehow I missed this was internal
. Carry on!
@jack-berg is this still "experimental"? Should we consider making it official at this point? |
I'm taking another try at making progress on OTLP retry! 😬
The PR #3636 draft PR demonstrated some strategies for implementing the OTLP retry required as a "MUST" in the spec. There were some good conversations and the resolution was to go to try to improve the wording of the spec so that the java implementation could be confident in its alignment.
I've attempted to improve the spec language, but that has stalled due to what I would describe as a catch 22. The spec says SDKs "MUST" implement retry, but lacks specificity needed by SDKs to implement with confidence that they'll be aligned with future language. At the same time the spec is reluctant to add more specific language without supporting data.
This PR proposes a strategy for adding experimental support for OTLP retry configuration. I hope it can help progress the issue by providing additional data to influence spec language. Here's what I propose:
DefaultGrpcExporterBuilder#addRetryPolicy(RetryPolicy)
. This method is on a class in an internal package. Neither the gRPC over okhttp or http/protobuf exporters have support for retry in this PR.DefaultGrpcExporterBuilder
delegate of the exporter builders. I.e. you can access theOtlpGrpcSpanExporterBuilder#delegate
field reflectively, and calladdRetryPolicy()
on it.otel.experimental.exporter.otlp.retry.enabled
. When enabled, this calls sets up the retry policy via the internal accessors.The net affect is the ability to add a retry policy without any additional public API surface area. The compromise needed to accomplish this was an accessor that uses reflection. Maybe there's another way, but I couldn't think of one.
Why should we care about retry?
This isn't a theoretical problem - I run otel for applications and I regularly see failed exports. Some applications may be able to tolerate intermittent failures in exports, but some will not. Lack of tolerance for network faults is a gap for production readiness.