-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvadmission: fix handling of non-elastic work with flow control #109446
kvadmission: fix handling of non-elastic work with flow control #109446
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Only left minor comments below.
Reviewed 7 of 7 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aadityasondhi)
-- commits
line 7 at r1:
Could you add a sentence explaining exactly how? Something around the rate of regular work being not shaped by flow control/below-raft admission rates, but it consuming some memory in below-raft work queues, hence the OOM.
-- commits
line 10 at r1:
s/skip/bypass
-- commits
line 11 at r1:
s/above Raft/above-raft. It might be better to say "We still subject such work to above-raft IO admission control, if enabled."
-- commits
line 13 at r1:
Capitalize "informs" and add a period at the end.
pkg/kv/kvserver/kvadmission/kvadmission.go
line 312 at r1 (raw file):
if ba.IsWrite() && !ba.IsSingleHeartbeatTxnRequest() { var bypassFlowcontrol bool attemptFlowControl := !bypassAdmission &&
Could you pull out the bypassAdmission
from this conditional, and s/if attemptFlowControl {
/if attemptFlowControl && !bypassAdmission
? I had to remind myself that the bypassAdmission
parameter is something included in admission.WorkInfo
below so it's ok to explicitly seek admission for (we in fact rely on it for token calculation purposes). It's worth a comment - apologies for not adding it myself earlier.
pkg/kv/kvserver/kvadmission/kvadmission.go
line 339 at r1 (raw file):
} // NB: When flow control is enabled for elastic work, and we encounter // non-elastic (i.e. "regular work), we skip waiting for tokens in
How about something like the following for this comment?
If flow control is disabled or if work bypasses flow control, we still subject it above-raft, leaseholder-only IO admission control
pkg/kv/kvserver/kvflowcontrol/kvflowcontrol.go
line 118 at r1 (raw file):
// until there are flow tokens available or the stream disconnects, subject to // context cancellation. It returns true if the request bypassed flow control. Admit(context.Context, admissionpb.WorkPriority, time.Time, ConnectedStream) (bool, error)
Invert the return type, returning true if admitted. Perhaps name the return type as "admitted" in this interface definition. Ditto below for the Handle interface. Write some comment about not needing to do anything with the first return type if err != nil.
I think calling code could look like:
admitted, err := c.Admit(...)
If !admitted and err == nil, that just means we didn't deduct tokens for it. There's a parallel for this form elsewhere:
cockroach/pkg/util/admission/work_queue.go
Line 558 in 9dc56ae
func (q *WorkQueue) Admit(ctx context.Context, info WorkInfo) (enabled bool, err error) { |
pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller.go
line 155 at r1 (raw file):
// being available, we'll also let them through if we're not // applying flow control to their specific work class. bypassTokens = c.mode() == kvflowcontrol.ApplyToElastic && class == admissionpb.RegularWorkClass
Nit: s/bypassTokens/bypass.
pkg/kv/kvserver/kvflowcontrol/kvflowhandle/kvflowhandle.go
line 98 at r1 (raw file):
// for regular traffic, that can be dealt with at the caller by not // calling .Admit() and ensuring we use the right raft entry encodings. return bypassTokens, nil
Just return false here directly instead of relying on uninit-ed bool. Do the same in the closed case below.
pkg/kv/kvserver/kvflowcontrol/kvflowhandle/kvflowhandle.go
line 126 at r1 (raw file):
if err != nil { h.metrics.onErrored(class, h.clock.PhysicalTime().Sub(tstart)) return bypassTokens, err
Return false directly in this error path. We expect the caller to just bail, right? Not bother proposing?
BTW - what are the semantics if different streams return different values for the boolean return type with no error? (If the mode cluster setting was changed partway through for example.) Right now we're just returning whatever the last stream[1] in this list evaluates, which I guess is fine, but worth an explicit comment
[1]: Why is it fine? In a sense it's the most recent read of the mode cluster setting. And we're not deducting tokens at this point - we only deduct it later when encoding the proposal, so we could decide based on what we saw last.
8268652
to
f2f6a07
Compare
983955f
to
fe97dbd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR. I made some changes based on your comments; let me know if you have any other suggestions.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif)
Previously, irfansharif (irfan sharif) wrote…
Could you add a sentence explaining exactly how? Something around the rate of regular work being not shaped by flow control/below-raft admission rates, but it consuming some memory in below-raft work queues, hence the OOM.
Done.
Previously, irfansharif (irfan sharif) wrote…
s/skip/bypass
Done.
Previously, irfansharif (irfan sharif) wrote…
s/above Raft/above-raft. It might be better to say "We still subject such work to above-raft IO admission control, if enabled."
Done.
Previously, irfansharif (irfan sharif) wrote…
Capitalize "informs" and add a period at the end.
Done.
pkg/kv/kvserver/kvadmission/kvadmission.go
line 312 at r1 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Could you pull out the
bypassAdmission
from this conditional, and s/if attemptFlowControl {
/if attemptFlowControl && !bypassAdmission
? I had to remind myself that thebypassAdmission
parameter is something included inadmission.WorkInfo
below so it's ok to explicitly seek admission for (we in fact rely on it for token calculation purposes). It's worth a comment - apologies for not adding it myself earlier.
I'm not sure what exactly to include in this comment. What do you mean by "it's ok to seek admission for explicitly"?
pkg/kv/kvserver/kvadmission/kvadmission.go
line 339 at r1 (raw file):
Previously, irfansharif (irfan sharif) wrote…
How about something like the following for this comment?
If flow control is disabled or if work bypasses flow control, we still subject it above-raft, leaseholder-only IO admission control
Done.
pkg/kv/kvserver/kvflowcontrol/kvflowcontrol.go
line 118 at r1 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Invert the return type, returning true if admitted. Perhaps name the return type as "admitted" in this interface definition. Ditto below for the Handle interface. Write some comment about not needing to do anything with the first return type if err != nil.
I think calling code could look like:
admitted, err := c.Admit(...)
If !admitted and err == nil, that just means we didn't deduct tokens for it. There's a parallel for this form elsewhere:
cockroach/pkg/util/admission/work_queue.go
Line 558 in 9dc56ae
func (q *WorkQueue) Admit(ctx context.Context, info WorkInfo) (enabled bool, err error) {
Done.
pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller.go
line 155 at r1 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Nit: s/bypassTokens/bypass.
Done.
pkg/kv/kvserver/kvflowcontrol/kvflowhandle/kvflowhandle.go
line 98 at r1 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Just return false here directly instead of relying on uninit-ed bool. Do the same in the closed case below.
Done.
pkg/kv/kvserver/kvflowcontrol/kvflowhandle/kvflowhandle.go
line 126 at r1 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Return false directly in this error path. We expect the caller to just bail, right? Not bother proposing?
BTW - what are the semantics if different streams return different values for the boolean return type with no error? (If the mode cluster setting was changed partway through for example.) Right now we're just returning whatever the last stream[1] in this list evaluates, which I guess is fine, but worth an explicit comment
[1]: Why is it fine? In a sense it's the most recent read of the mode cluster setting. And we're not deducting tokens at this point - we only deduct it later when encoding the proposal, so we could decide based on what we saw last.
Done.
fe97dbd
to
9789da7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aadityasondhi)
pkg/kv/kvserver/kvadmission/kvadmission.go
line 312 at r1 (raw file):
Previously, aadityasondhi (Aaditya Sondhi) wrote…
I'm not sure what exactly to include in this comment. What do you mean by "it's ok to seek admission for explicitly"?
I meant that even though we know at the caller we're bypassing admission, we still have to explicit invoke .Admit() but pass in this bypassAdmission = true
through the WorkInfo`. We do it for correct token accounting, i.e. we deduct tokens without blocking.
Previously, when we were skipping flow control for regular work, we were still encoding raft messages with AC encoding. This would enqueue essentially no-op AC work items below raft, increasing the possibility of OOM. This is because we would not be pacing the rate of regular work by below-raft admission rates but it would be consuming memory in below-raft work queues. This change skips encoding raft messages with AC encoding for cases where we bypass flow control. Additionally, we still subject such work to above-raft IO admission control, if enabled. Informs cockroachdb#104154. Release note: None
9789da7
to
2734eda
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif)
pkg/kv/kvserver/kvadmission/kvadmission.go
line 312 at r1 (raw file):
Previously, irfansharif (irfan sharif) wrote…
I meant that even though we know at the caller we're bypassing admission, we still have to explicit invoke .Admit() but pass in this
bypassAdmission = true
through the WorkInfo`. We do it for correct token accounting, i.e. we deduct tokens without blocking.
Thanks for clarifying, I added the comment.
bors r=irfansharif |
Build succeeded: |
Previously, when we were skipping flow control for regular work, we were
still encoding raft messages with AC encoding. This would enqueue
essentially no-op AC work items below raft, increasing the possibility
of OOM. This is because we would not be pacing the rate of regular work
by below-raft admission rates but it would be consuming memory in
below-raft work queues.
This change skips encoding raft messages with AC encoding for cases
where we bypass flow control. Additionally, we still subject such work
to above-raft IO admission control, if enabled.
Informs #104154.
Release note: None