-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvevent: Ensure out of quota events correctly handled #87464
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)
b.mu.Lock() | ||
b.mu.numBlocked++ | ||
b.mu.Unlock() | ||
b.notifyOutOfQuota() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question I have is whether this can lead to back-to-back emission of flush events. I suppose that's fine, especially now, but in a world with some processing parallelism it might get weird.
Can we come up with something simple to prevent that? Either bookkeeping or more careful synchronization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question I have is whether this can lead to back-to-back emission of flush events. I suppose that's fine, especially now, but in a world with some processing parallelism it might get weird.
Can we come up with something simple to prevent that? Either bookkeeping or more careful synchronization?
I don't think this can lead to back-to-back emission of flush events, with exception of one corner case.
I think there are few important observations:
- signalCh has a buffer of 1; this buffer is to ensure that notification to single-threaded consumer (Get) never misses notification.
- Producers (there can be multiple, e.g. during backfill) may concurrently run out of quota; however only 1 of those produces will be at the head of the (quota pool) queue, and thus only 1 consumer will invoke notify out of quota
- (This really flows from 1 & 2) -- Producer either puts an event into blocking buffer queue, OR it puts a notification onto the channel that it is out of quota.
With this in mind, the condition to trigger Flush is: the queue is empty and we have some producers blocked
(if !ok && b.mu.numBlocked > 0 ...
). That means that producer is blocked and there is nothing else in blocking buffer queue to cause consumer to try to release any resources -- i.e. every allocated resource is buffered.
So, we trigger flush. Because this flush was triggered when there were no outstanding events in blocking buffer
queue, that means that every outstanding event/allocations must have been released.
The original producer that was blocked will attempt to acquire resources, now that everything has been released.
If the original producer failed to acquire resource (that means that our parent memory monitor is very tight), then that's fine -- we will trigger a no-op flush -- and we'll do that once a second (that's the corner case).
If it acquired resource -- great, it queues the event. We have made forward progress; if the next producer also blocked -- well, that means that perhaps our memory buffer is tiny, or events are huge -- either way flush will be triggered correctly -- even if that means that we flush 1 event.
Is my analysis sound? Do you still think more book-keeping needed? Or perhaps just copying above as a comment because it's not obvious at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another corner case, I suppose is: between original consumer acquiring quota, but before it enqueued the event...
A no-op flush can happen. Is this the corner case you were worried about @ajwerner ?
Actually, that doesn't happen because we don't notify when we acquire quota -- we notify when we enqueue.
So, I don't think this can happen either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Publisher enqueues and is blocked, so sends on the channel
- Consumer is signaled by the send, locks the mutex, notices that there's a blocked producer, creates a flush event
- Second publisher enqueues and sends on the channel (or not)
- Consumer now sends another flush event
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Publisher enqueues and is blocked, so sends on the channel
- Consumer is signaled by the send, locks the mutex, notices that there's a blocked producer, creates a flush event
- Second publisher enqueues and sends on the channel (or not)
- Consumer now sends another flush event
I don't think 3 can happen; because all blocked producers are blocked in a queue of their own (in quota pool); thus;
after step 2 (Flush) the only producer that can wake up is the first producer -- and that producer either produces
an event or is blocked again (as described in the corner case above).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I stand corrected; I guess a no-op flush is possible.
Notify out of quota now takes in a boolean indicating if flush is possible...
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @miretskiy)
pkg/ccl/changefeedccl/kvevent/blocking_buffer.go
line 334 at r2 (raw file):
fulfilled, tryAgainAfter = r.acquireQuota(ctx, quota) if !fulfilled { quota.notifyOutOfQuota(quota.allocated == 0 || quota.canAllocateBelow > 0)
Can you explain this in commentary? I don't get it.
Ensure that out of quota events are not lost and propagated if necessary to the consumer. Prior to this change, it was possible for an out of quota notification to be "lost" because "blocked" bit would be cleared out when an event was enqueued. Instead of relying on a boolean bit, we now keep track of the number of consumers currently blocked, and issue flush request if there are non-zero blocked consumers with zero events currently queued. Fixes cockroachdb#86828 Release justification: bug fix Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I wish we had testing. Can we make the buffer size metamorphic?
bors r+ |
Build failed: |
bors r+ |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 12a1b04 to blathers/backport-release-21.2-87464: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 21.2.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
Previous PR cockroachdb#87464 erroneously removed code to ensure that consumer is notified about out of quota events once. Rectify this issue. Release Justification: bug fix Release note: None
87611: authors: add Ganeshprasad Rajashekhar Biradar to authors. r=biradarganesh25 a=biradarganesh25 Release note: None Release justification: non-production code change 87737: kvevent: Avoid busy loop during out of quota r=miretskiy a=miretskiy Previous PR #87464 erroneously removed code to ensure that consumer is notified about out of quota events once. Rectify this issue. Release Justification: bug fix Release note: None Co-authored-by: Ganeshprasad Rajashekhar Biradar <[email protected]> Co-authored-by: Yevgeniy Miretskiy <[email protected]>
Previous PR cockroachdb#87464 erroneously removed code to ensure that consumer is notified about out of quota events once. Rectify this issue. Release Justification: bug fix Release note: None
Previous PR cockroachdb#87464 erroneously removed code to ensure that consumer is notified about out of quota events once. Rectify this issue. Release Justification: bug fix Release note: None
Ensure that out of quota events are not lost and propagated if necessary to the consumer.
Prior to this change, it was possible for an out of quota notification to be "lost" because "blocked" bit would be cleared out when an event was enqueued.
Instead of relying on a boolean bit, we now keep track of the number of consumers currently blocked, and issue flush request if there are non-zero blocked consumers with zero events currently queued.
Fixes #86828
Release justification: bug fix
Release note: None