-
Notifications
You must be signed in to change notification settings - Fork 894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retryable HTTP Statuses should be configurable in OTLP clients #3876
Comments
This seems like a good thing to do on first glance, since a OTLP receiver doesn't always have control of the HTTP status codes returned. However, in practice, solving this issue will be tricky because as of today since there is no specification for common retry configuration options. It appears that solving this issue would need to be part of a larger effort to normalize OTLP retry configuration across SDKs. Marking this as "triage accepted", but whoever takes this on should be conscious of the bigger picture. |
We discussed issues with the OTLP retry spec broadly in the 8/7/14 and 8/14/24 TC meetings. I wrote a document summarizing a number of somewhat overlapping OTLP retry issues, and sketching out some proposals on how to fix them. For this specific issue, there was apparent consensus that retryable status codes should be configurable. In my previous comment I mentioned that this would be tricky because there is no specification for common retry configuration options, and that's true, but we should try to separate the issues. The lack of specification around the retry exponential backoff algorithm has led to diverging stable implementations as outlined in #4138, but that all implementations should be roughly aligned on the set of status codes which are retryable. It seems plausible, and maybe even straight forward, to introduce an option for the OTLP SDK exporter specification which makes the retryable status codes configurable. |
The OTLP spec lists 502, 503 and 504 as the only retryable 50x status codes. However some servers (and some CDNs) return a 500, even though it isn't the most appropriate status code, as a generic "something went wrong serving that request".
For cases where it is known that the remote server may return a 500 for retryable conditions, it would be useful if the retryable http statuses could be extended or configured to include it. That would help prevent data from being lost under these conditions.
The text was updated successfully, but these errors were encountered: