Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification on exporter timeout config #2346

Open
svetlanabrennan opened this issue Feb 15, 2022 · 5 comments
Open

Clarification on exporter timeout config #2346

svetlanabrennan opened this issue Feb 15, 2022 · 5 comments
Labels
spec:protocol Related to the specification/protocol directory triage:accepted:ready Ready to be implemented. Small enough or uncontroversial enough to be implemented without sponsor

Comments

@svetlanabrennan
Copy link

What are you trying to achieve?
Need clarification on how timeout should be used for the export process.

Additional context.

Spec says that timeout is the max time the OTLP exporter will wait for each batch export but does this mean:

A. timeout for the entire export process (including retry requests)
B. timeout for each individual request
C. timeout for a certain phase in the request (socket, req, res).

@cbandy
Copy link

cbandy commented Dec 7, 2022

I also wonder how these OTLP timeouts relate to similar-sounding timeouts defined for batch processors: "Maximum allowed time (in milliseconds) to export data."

@mtwo
Copy link
Member

mtwo commented Jul 9, 2024

Triage comments: we assume that the answer is B. Do you know of any SDKs that are struggling with this definition? We've marked this as 'ready' so that someone can create a clarification PR for it.

@mtwo mtwo added the triage:accepted:ready Ready to be implemented. Small enough or uncontroversial enough to be implemented without sponsor label Jul 9, 2024
@MrAlias
Copy link
Contributor

MrAlias commented Jul 17, 2024

The Go SIG has interpreted this as (A) the the time out applied for the entire export process, including any retries.

@jack-berg
Copy link
Member

The Go SIG has interpreted this as (A) the the time out applied for the entire export process, including any retries.

Same with java.

@jack-berg
Copy link
Member

I think it actually has to be A: timeout for the entire export process.

As discussed in #4138, there is no standard definition of the OTLP retry exponential backoff algorithm. If the timeout limited individual export requests, then there would be no standard way to ensure that an exporter's export resolves within the limit of its associated BatchSpanProcessor, BatchLogRecordProcessor, or PeriodicMetricReader. A counter argument might be that if the total time for all export requests is just 10s (by default), then that doesn't leave a lot of room for individual attempts to fail and retry. I think it does leave room and we just expect OTLP receivers to return retryable status codes quickly enough to allow for retries with 10s limit for all requests. If the 10s is not enough, the user can extend the timeout to a higher value (provided they also extend the BatchSpanProcessor, BatchLogRecordProcessor, PeriodicMetricReader limits after 30s).

However, as noted in the issue, the OTLP SDK exporter spec also underspecifies connection timeouts. In my experience, I've seen HTTP client libraries with default connect timeouts of 10s and unset. In both of these cases, a OTLP exporter can spend the entire 10s export budget waiting for the first attempt to connect. We need a way to configure OTLP exporters with a specific connection timeout so that there is some ability to retry when connection issues occur.

Users need a way to say, set connect timeout to 5s and overall export timeout to 15s. This will allow the OTLP export to retry up to 3 times when connect timeouts occur, and potentially connect to different instances each time. In my experience, this greatly increases the chance of export success.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:protocol Related to the specification/protocol directory triage:accepted:ready Ready to be implemented. Small enough or uncontroversial enough to be implemented without sponsor
Projects
Status: Spec - Accepted
Development

No branches or pull requests

6 participants