Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retryable HTTP Statuses should be configurable in OTLP clients #3876

Open
haus opened this issue Feb 14, 2024 · 2 comments
Open

Retryable HTTP Statuses should be configurable in OTLP clients #3876

haus opened this issue Feb 14, 2024 · 2 comments
Labels
spec:protocol Related to the specification/protocol directory triage:accepted:needs-sponsor Ready to be implemented, but does not yet have a specification sponsor

Comments

@haus
Copy link

haus commented Feb 14, 2024

The OTLP spec lists 502, 503 and 504 as the only retryable 50x status codes. However some servers (and some CDNs) return a 500, even though it isn't the most appropriate status code, as a generic "something went wrong serving that request".
For cases where it is known that the remote server may return a 500 for retryable conditions, it would be useful if the retryable http statuses could be extended or configured to include it. That would help prevent data from being lost under these conditions.

@jack-berg jack-berg added [label deprecated] triaged-accepted [label deprecated] Issue triaged and accepted by OTel community, can proceed with creating a PR spec:protocol Related to the specification/protocol directory labels Feb 14, 2024
@jack-berg
Copy link
Member

This seems like a good thing to do on first glance, since a OTLP receiver doesn't always have control of the HTTP status codes returned. However, in practice, solving this issue will be tricky because as of today since there is no specification for common retry configuration options. It appears that solving this issue would need to be part of a larger effort to normalize OTLP retry configuration across SDKs. Marking this as "triage accepted", but whoever takes this on should be conscious of the bigger picture.

Some related issues: #3314, #3639, #1742, #1528

@austinlparker austinlparker added triage:accepted:needs-sponsor Ready to be implemented, but does not yet have a specification sponsor and removed [label deprecated] triaged-accepted [label deprecated] Issue triaged and accepted by OTel community, can proceed with creating a PR labels Apr 30, 2024
@austinlparker austinlparker moved this to Spec - Accepted in 🔭 Main Backlog Jul 16, 2024
@jack-berg
Copy link
Member

We discussed issues with the OTLP retry spec broadly in the 8/7/14 and 8/14/24 TC meetings. I wrote a document summarizing a number of somewhat overlapping OTLP retry issues, and sketching out some proposals on how to fix them.

For this specific issue, there was apparent consensus that retryable status codes should be configurable. In my previous comment I mentioned that this would be tricky because there is no specification for common retry configuration options, and that's true, but we should try to separate the issues. The lack of specification around the retry exponential backoff algorithm has led to diverging stable implementations as outlined in #4138, but that all implementations should be roughly aligned on the set of status codes which are retryable. It seems plausible, and maybe even straight forward, to introduce an option for the OTLP SDK exporter specification which makes the retryable status codes configurable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:protocol Related to the specification/protocol directory triage:accepted:needs-sponsor Ready to be implemented, but does not yet have a specification sponsor
Projects
Status: Spec - Accepted
Development

No branches or pull requests

4 participants