-
Notifications
You must be signed in to change notification settings - Fork 759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry policy for OTLP/gRPC exporter #4587
Conversation
{ | ||
StatusCode.Cancelled, | ||
StatusCode.DeadlineExceeded, | ||
StatusCode.ResourceExhausted, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: add a test validating that the Grpc.Net.Client library honors the retry_delay
sent from the server when StatusCode.ResourceExhausted
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The server could be sending a retry_delay
even with StatusCode.Unavailable
. Client should be respecting that as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JamesNK I'm curious if you have a moment to share guidance here...
This PR implements retry for our gRPC exporter using a RetryPolicy
. There are a number of status codes which can be retried, however, we have a requirement that cannot be expressed using a RetryPolicy
alone when status is Unavailable
or Resource_Exhausted
:
The server MUST indicate retryable errors using code Unavailable and MAY supply additional details via status using RetryInfo containing 0 value of RetryDelay.
My hope was to avoid rolling our own retry mechanism, but from what I can tell Grpc.Net.Client does not support handling RetryInfo.
Options I can think of
- Roll our own retry mechanism for all status codes conforming to our requirements.
- Use a RetryPolicy for retryable codes which do not require handling RetryInfo and only roll our own logic for Unavailable and Resource_Exhaused.
Are there benefits to one option over the other? Are there better options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grpc.Net.Client follows the retry spec: https://github.com/grpc/proposal/blob/master/A6-client-retries.md. It doesn't have anything related to RetryInfo.
Do other gRPC client libraries support RetryInfo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do other gRPC client libraries support RetryInfo?
No, not from what I've seen. This RetryInfo seems to be something specific to some Google APIs. It's unfortunate that the OTel spec adopted this over the standard gRPC retry spec.
OTLP supports both gRPC and HTTP and since HTTP has a Retry-After
response header I suspect the decision to adopt this RetryInfo thing for gRPC was inspired by the desire to keep parity with the retry behavior between both HTTP and gRPC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gRPC retry supports the grpc-retry-pushback-ms
response header which seems similar.
Grpc.Net.Client retries doesn't support RetryInfo. I suggest you look at other implementations to see what they do. If they have a clever technique then maybe that is something that could also be done in .NET. If they are all doing retries manually then you'll have to do it manually.
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #4587 +/- ##
==========================================
+ Coverage 85.13% 85.18% +0.05%
==========================================
Files 320 320
Lines 12679 12707 +28
==========================================
+ Hits 10794 10825 +31
+ Misses 1885 1882 -3
|
@@ -10,6 +10,14 @@ | |||
are now included in this package. | |||
([#4556](https://github.com/open-telemetry/opentelemetry-dotnet/pull/4556)) | |||
|
|||
* Added support for the OTLP gRPC exporter to retry failed requests due to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't cover PartialSuccess
, right? We wouldn't be retrying any of the rejected items from a partially successful response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the spec does not support retrying rejected items from a partially successful response. However, you bring up a good point, the spec does say
The client MUST NOT retry the request when it receives a partial success response where the partial_success is populated.
I need to fix this.
return GrpcChannel.ForAddress(options.Endpoint); | ||
// The specification does not dictate the parameters of the default retry policy. | ||
// These defaults are borrowed from opentelemetry-java. | ||
var retryPolicy = new RetryPolicy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does this impact the default deadline we set?
Is it possible that while retrying the deadline is exceeded and the retry does not complete or not performed(if retryAfter is more than default deadline)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or Pushing will instruct the bot to automatically remove the label. This bot runs once per day. |
It seems that using the built in retry mechanism of Grpc.Net.Client may not work for our purposes since the OTel spec has some unique non-standard requirements. I've got another solution in the works that is spec compliant. Closing this PR for now and will open another soon. |
Implements the retry behavior described by the specification for the OTLP/gRPC exporter.