-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[exporterhelper] boundedMemoryQueue may send on closed channel #7388
Comments
The collector will stop in topological order so that upstream components are stopped before downstream components. This ensures that each component has a chance to drain to its consumer before the consumer is stopped. Therefore, under normal circumstances, before the exporter is closed, the receiver and processor have guaranteed to complete the data transmission and close. When the exporter is closing, no data will enter the exporter. |
Thanks for your reply @yutingcaicyt. When select {
case <-req.Context().Done():
return fmt.Errorf("Request is cancelled or timed out %w", err)
case <-rs.stopCh:
return rs.onTemporaryFailure(rs.logger, req, fmt.Errorf("interrupted due to shutdown %w", err))
case <-time.After(backoffDelay):
} |
The re-enqueuing is only applicable to the persistent queue, which doesn't have a closed channel. Memory queue doesn't get data re-enqueued |
This change unblocks adding the `enqueue_on_failure` option to the queue sender by removing the requeue behavior on the shutdown. If we don't remove requeue on shutdown, it's possible to run into a situation described in #7388. After the recent refactoring, the chance of running into it is pretty small, but it's still possible. The only reason to requeue on shutdown is to make sure there is no data loss with the persistent queue enabled. The persistent queue captures all the inflight requests in the persistent storage anyway, so there is no reason to requeue an inflight request. The only downside is it potentially can cause sending duplicate data on the collector restart in case of a partially failed request during shutdown. Another option would be to rework the memory queue to never close the channel but still ensure draining.
) This change unblocks adding the `enqueue_on_failure` option to the queue sender by removing the requeue behavior on the shutdown. If we don't remove requeue on shutdown, it's possible to run into a situation described in open-telemetry#7388. After the recent refactoring, the chance of running into it is pretty small, but it's still possible. The only reason to requeue on shutdown is to make sure there is no data loss with the persistent queue enabled. The persistent queue captures all the inflight requests in the persistent storage anyway, so there is no reason to requeue an inflight request. The only downside is it potentially can cause sending duplicate data on the collector restart in case of a partially failed request during shutdown. Another option would be to rework the memory queue to never close the channel but still ensure draining.
I believe this issue to be fixed as we have changed the implementation of queues quite a bit, as @dmitryax points above. The code is completely different now. Previously, a test was flaky because of this problem, but it no longer is. Closing as resolved. Feel free to comment or create new issues to follow up. |
Describe the bug
When the gorutine prepares to submit new item to the queue,
q.stopped.Load()
returns true. The gorutine has not finished sending items to the channel.At the same time, service shutdown and invokes
stop()
to close the channel. The gorutine may send item to the closed channel.What did you expect to see?
thread-safe
What did you see instead?
What version did you use?
v0.73.0
The text was updated successfully, but these errors were encountered: