You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This has been reported by @pveentjer in a private conversation about io_uring Netty mechanics.
schedule a timer in 2 minutes
await till the IoUring event loop park/wait
schedule a second timer 10 seconds which is going to replace the last deadline set
once it fires, the next in-line scheduled task was the previous one of 2 min(now <= 1m50s, likely): the event loop arm a timer for this deadline (<= 1m50s) BUT there was already such a timer, hence we arm it twice!
If we have a huge amount of already registered timers in the future, they will see their registrations to always happen twice, if their deadline is replaced with a new recent one.
Update the timer through a remove (via IORING_TIMEOUT_UPDATE) require >=5.11, while removing the existing one (when the new recent one need to be armed) is >=5.5.
There are several ways to address it, including NOT addressing it, but it can still cause silently to goes OOM or worse (no idea really).
I believe we had no covering for this because our await operations didn't allow any level of concurrency: if we block awaiting, is a blocking operations, period. But if we request to be awaken in the future, and we're awaken earlier, the same "in flight" request is not yet completed, hence allow to enqueue more and more of this.
The text was updated successfully, but these errors were encountered:
I had a memory issue and @franz1981 suggested netty#211 as the cause. This patch is my fix for that bug, though I don't believe my mem issue was ultimately caused by this.
This PR does the legwork for adding ioringOpTimeoutRemove, and implementing a test. However two things can still be improved:
- [ ] could use IORING_TIMEOUT_UPDATE (see netty#211) to save one sqe.
- [ ] there may be a race in IOUringEventLoop between the addTimeout and the IORING_OP_TIMEOUT handler. If the kernel fires a deadline cqe, then we send a deadline update sqe, and only then we process the first cqe, prevDeadlineNanos is NONE even though we've submitted a new deadline. I'm not sure if this can actually happen since deadline changes should only adjust the deadline downwards, not upwards? Not sure.
This has been reported by @pveentjer in a private conversation about io_uring Netty mechanics.
If we have a huge amount of already registered timers in the future, they will see their registrations to always happen twice, if their deadline is replaced with a new recent one.
Update the timer through a remove (via
IORING_TIMEOUT_UPDATE
) require >=5.11, while removing the existing one (when the new recent one need to be armed) is >=5.5.There are several ways to address it, including NOT addressing it, but it can still cause silently to goes OOM or worse (no idea really).
I believe we had no covering for this because our await operations didn't allow any level of concurrency: if we block awaiting, is a blocking operations, period. But if we request to be awaken in the future, and we're awaken earlier, the same "in flight" request is not yet completed, hence allow to enqueue more and more of this.
The text was updated successfully, but these errors were encountered: