-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reenqueue may not be needed #8382
Comments
I very much agree with the suggestion. I don't know why it was introduced in the first place. I think the only scenario when we need to re-enqueue is during shutdown. @swiatekm-sumo @bogdandrutu WDYT? |
Requeueing as implemented is how a lot of users intuitively expect the queue to work, in my experience. That is, they expect no data to be dropped if the queue still has room. And this behaviour is difficult to reproduce by adjusting retry settings. I'd prefer to add it as an option to the in-memory queue, with a default that is consistent between the two queue types. |
So how should we deal with it better? |
Like I said, add the configuration option so both queues behave the same way. Then you can decide which behaviour you prefer. |
With the reenqueue option, users have two ways to control the retry of the data. Is there an alternative relationship between these two ways? If the user does not want to discard data, can it be enough to increase the retry time? |
I'm trying to introduce this config option in the scope of #8987 and running into a couple of issues:
We discussed that with @bogdandrutu, and we are not sure the re-enqueue capability is useful enough. If a request cannot be delivered right now, why do we need to put it back to the queue and try another one instead of increasing the backoff timeout and re-trying the same request again? @swiatekm-sumo can you please provide some real-world examples where reququing would be needed on top of the backoff retry? |
@dmitryax thinking about it more, reenqueueing is logically equivalent to retrying forever. As long as we document that users who want that behaviour should instead set |
If we remove the re-queue capability straight away, there is a possibility of double publishing after the shutdown if the retry sender has partially failed requests. We would need to update the persistent storage state for partially delivered requests stuck in the retry sender. This should be easy to do after the recent refactoring, I can incorporate this change. |
Functionally these two options feel like the same. But performance speaking, not having reenqueue looks like a huge hit for persistent queue users? If there is an outage in the destination,
|
The amount of data kept in memory is always restricted by the number of queue workers. And they are always busy, it doesn't matter if we keep the same data for backoff retry or keep pushing data back and forth to the queue. In fact, the latter takes even more memory due to constant marshaling/unmarshaling with GC catching up.
This is always happening right now on shutdown. I'm suggesting that we do this only for partially failed requests which doesn't happen often. |
The current re-enqueuing behavior is not obvious and cannot be configured. It takes place only for persistent queue and only if `retry_on_failure::enabled=true` even if `retry_on_failure` is a setting for a different backoff retry strategy. This change removes the re-enqueuing behavior in favor of the `retry_on_failure` option. Consider increasing `retry_on_failure::max_elapsed_time` to reduce chances of data loss. Resolves #8382
The current re-enqueuing behavior is not obvious and cannot be configured. It takes place only for persistent queue and only if `retry_on_failure::enabled=true` even if `retry_on_failure` is a setting for a different backoff retry strategy. This change removes the re-enqueuing behavior in favor of the `retry_on_failure` option. Consider increasing `retry_on_failure::max_elapsed_time` to reduce chances of data loss. Resolves open-telemetry#8382
Is your feature request related to a problem? Please describe.
When I used the persistent queue, I found that the strategy dealing with requests that retry failed is different. The persistent queue enqueues these requests but the memory queue does not. In fact, users can configure the retry time themselves, if timeout, the request can be discarded intuitively. If reenqueue, the time a request tries to send is hard to know. So why not keep it simple and Let the user decide how long a request can be retried at most?
Describe the solution you'd like
We can remove the reenqueue feature to make it simple and intuitive so users can decide the retry time with the configuration.
The text was updated successfully, but these errors were encountered: