Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add specification for otlp retry configuration parameters and defaults #1974

Closed
wants to merge 3 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion specification/protocol/exporter.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,10 @@ The `OTEL_EXPORTER_OTLP_HEADERS`, `OTEL_EXPORTER_OTLP_TRACES_HEADERS`, `OTEL_EXP

Transient errors MUST be handled with a retry strategy. This retry strategy MUST implement an exponential back-off with jitter to avoid overwhelming the destination until the network is restored or the destination has recovered.

For OTLP/HTTP, the errors `408 (Request Timeout)` and `5xx (Server Errors)` are defined as transient, detailed information about erros can be found in the [HTTP failures section](otlp.md#failures). For the OTLP/gRPC, the full list of the gRPC retryable status codes can be found in the [gRPC response section](otlp.md#otlpgrpc-response).
For OTLP/HTTP, the errors `408 (Request Timeout)` and `5xx (Server Errors)` are defined as transient, detailed information about errors can be found in the [HTTP failures section](otlp.md#failures). For the OTLP/gRPC, the full list of the gRPC retryable status codes can be found in the [gRPC response section](otlp.md#otlpgrpc-response).

SDKs MAY use the built-in [gRPC Retry](https://github.com/grpc/proposal/blob/master/A6-client-retries.md) mechanism to facilitate exponential back-off.

SDKs SHOULD have retry configuration and mechanics that are consistent across OTLP protocols. For example, if the built-in gRPC Retry mechanism is used for the `grpc` protocol, the `http/protobuf` and `http/json` protocols should expose the same configuration options and compute the backoff duration in the same manner.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I may not have articulated my point correctly - I don't think we should have this language without specifying at least what knobs are present. Different SDKs having different knobs is not the intent of the OTel spec I think. If we can't come up with any standard for at least the retry configuration, than I think the spec can only suggest "no knobs". SDKs could still add knobs in a way that is obvious they can be removed at any time, perhaps with an experimental prefix to the env variables or in an experimental package, but any official knob would be spec-incompliant, which is good to ensure that languages don't diverge in their knobs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tigrannajaryan Is there a mechanism to add this paragraph in an experimental fashion to the spec itself?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have this language as well as corresponding knobs in the spec. Removing the knobs was a compromise to try to get some movement.

I accepted the compromise of removing the knobs because by explicitly allowing the built in gRPC Retry mechanism, and saying that SDKS SHOULD have retry configuration ..., the knobs defined in gRPC are the implied means of configuration.


[otlphttp-req]: otlp.md#otlphttp-request