You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In a gateway configuration of the collector, it is likely for an OTLP receiver and exporter to be configured. Any gRPC exporter/receiver pair that both propagate timeout information are likely to fall into this problem scenario. (For example, the OTel-Arrow components will behave the same after open-telemetry/opentelemetry-collector-contrib#34742 merges.)
In a synchronous pipeline, it means the caller's context is likely to have a deadline before it reaches an exporter, typically added by the pipeline receiver or propagated from an earlier pipeline segment's exporter.
WithDeadline returns a copy of the parent context with the deadline adjusted to be no later than d. If the parent's deadline is already earlier than d, WithDeadline(parent, d) is semantically equivalent to parent.
The result is that exporterhelper will pass through a timeout shorter than its configured timeout. This may be working as intended for some, and a bug for others. When I set a 30s timeout on the exporter and my receiver observes a 5s timeout instead, it's likely to cause confusion.
The behavior of the retry sender is even more problematic. The retry sender is likely to create the problem described above by ignoring the arriving context timeout. When the retry sender's max_elapsed setting is greater than the arriving context timeout, we are virutally assured of issuing one short-timeout request at the end of a series of retries.
Say you've configured your exporterhelper like this:
retry_sender:
enabled: true
timeout: 30s
If there is an incoming context deadline that is shorter than the configured max_elapsed setting, retry sending will continue past the original deadline. Since the arriving context deadline is preserved through all of this, the final export request is likely to have a timeout shorter than 30s. This leads to confusion.
Steps to reproduce
To reproduce this I created a client application that would send as fast as it can through a collector pipeline w/ both timeout and retry sender configured.
What did you expect to see?
I was expecting to see what I saw, because I've seen this in production and wanted to reproduce it.
What did you wish to see instead?
I have several wishes here.
First, the retry sender should consider the context deadline. When backoffDelay is computed it should be compared against the context deadline (if set). If there is a deadline that is less than the backoff delay, we should not fall into the selct statement and wait for cancelation, we should fail fast at that point w/ an gRPC Aborted code, message "insufficient context deadline for retry".
Second, the exporter timeout sender would ideally have a way to impose a minimum timeout. When the timeout sender is configured with a 5s timeout and the arriving context has a 1s deadline, I think I'd like two optional behaviors:
Ignore the arriving timeout, let the configured timeout override. With this option, I expect the caller's export will end in as deadline-exceeded while the pipeline continues to export the data w/ an independent deadline.
Fail fast. When the arriving context timeout is less than the configured timeout, fail immediately w/ an Aborted code.
Third, I would like for the OTLP receiver to support the same feature, a configuration that allows ignoring Receiver to ignore arriving timeouts when they are unreasonably short. This will have to be done on a per-receiver basis, but possibly a receiverhelper library could help receivers have uniform behavior.
What version did you use?
v0.108.x
What config did you use?
as shown above.
Environment
Additional context
The text was updated successfully, but these errors were encountered:
… retry (#11331)
#### Description
The retry sender will delay until the context is canceled, where instead
it could fail fast with a transient error and a clear message that no
more retries are possible given the configuration.
#### Link to tracking issue
Part of #11183
#### Testing
One new test.
… retry (open-telemetry#11331)
#### Description
The retry sender will delay until the context is canceled, where instead
it could fail fast with a transient error and a clear message that no
more retries are possible given the configuration.
#### Link to tracking issue
Part of open-telemetry#11183
#### Testing
One new test.
Describe the bug
In a gateway configuration of the collector, it is likely for an OTLP receiver and exporter to be configured. Any gRPC exporter/receiver pair that both propagate timeout information are likely to fall into this problem scenario. (For example, the OTel-Arrow components will behave the same after open-telemetry/opentelemetry-collector-contrib#34742 merges.)
In a synchronous pipeline, it means the caller's context is likely to have a deadline before it reaches an exporter, typically added by the pipeline receiver or propagated from an earlier pipeline segment's exporter.
Note that the timeout sender does recognize the incoming deadline, but not intentionally. The Golang context package will never increase a deadline, but it will shorten a deadline:
The result is that exporterhelper will pass through a timeout shorter than its configured timeout. This may be working as intended for some, and a bug for others. When I set a 30s timeout on the exporter and my receiver observes a 5s timeout instead, it's likely to cause confusion.
The behavior of the retry sender is even more problematic. The retry sender is likely to create the problem described above by ignoring the arriving context timeout. When the retry sender's max_elapsed setting is greater than the arriving context timeout, we are virutally assured of issuing one short-timeout request at the end of a series of retries.
Say you've configured your exporterhelper like this:
If there is an incoming context deadline that is shorter than the configured max_elapsed setting, retry sending will continue past the original deadline. Since the arriving context deadline is preserved through all of this, the final export request is likely to have a timeout shorter than 30s. This leads to confusion.
Steps to reproduce
To reproduce this I created a client application that would send as fast as it can through a collector pipeline w/ both timeout and retry sender configured.
What did you expect to see?
I was expecting to see what I saw, because I've seen this in production and wanted to reproduce it.
What did you wish to see instead?
I have several wishes here.
First, the retry sender should consider the context deadline. When
backoffDelay
is computed it should be compared against the context deadline (if set). If there is a deadline that is less than the backoff delay, we should not fall into the selct statement and wait for cancelation, we should fail fast at that point w/ an gRPC Aborted code, message "insufficient context deadline for retry".Second, the exporter timeout sender would ideally have a way to impose a minimum timeout. When the timeout sender is configured with a 5s timeout and the arriving context has a 1s deadline, I think I'd like two optional behaviors:
Third, I would like for the OTLP receiver to support the same feature, a configuration that allows ignoring Receiver to ignore arriving timeouts when they are unreasonably short. This will have to be done on a per-receiver basis, but possibly a
receiverhelper
library could help receivers have uniform behavior.What version did you use?
v0.108.x
What config did you use?
as shown above.
Environment
Additional context
The text was updated successfully, but these errors were encountered: