Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvadmission: fix handling of non-elastic work with flow control #109446

Conversation

aadityasondhi
Copy link
Collaborator

@aadityasondhi aadityasondhi commented Aug 24, 2023

Previously, when we were skipping flow control for regular work, we were
still encoding raft messages with AC encoding. This would enqueue
essentially no-op AC work items below raft, increasing the possibility
of OOM. This is because we would not be pacing the rate of regular work
by below-raft admission rates but it would be consuming memory in
below-raft work queues.

This change skips encoding raft messages with AC encoding for cases
where we bypass flow control. Additionally, we still subject such work
to above-raft IO admission control, if enabled.

Informs #104154.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Only left minor comments below.

Reviewed 7 of 7 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aadityasondhi)


-- commits line 7 at r1:
Could you add a sentence explaining exactly how? Something around the rate of regular work being not shaped by flow control/below-raft admission rates, but it consuming some memory in below-raft work queues, hence the OOM.


-- commits line 10 at r1:
s/skip/bypass


-- commits line 11 at r1:
s/above Raft/above-raft. It might be better to say "We still subject such work to above-raft IO admission control, if enabled."


-- commits line 13 at r1:
Capitalize "informs" and add a period at the end.


pkg/kv/kvserver/kvadmission/kvadmission.go line 312 at r1 (raw file):

	if ba.IsWrite() && !ba.IsSingleHeartbeatTxnRequest() {
		var bypassFlowcontrol bool
		attemptFlowControl := !bypassAdmission &&

Could you pull out the bypassAdmission from this conditional, and s/if attemptFlowControl {/if attemptFlowControl && !bypassAdmission? I had to remind myself that the bypassAdmission parameter is something included in admission.WorkInfo below so it's ok to explicitly seek admission for (we in fact rely on it for token calculation purposes). It's worth a comment - apologies for not adding it myself earlier.


pkg/kv/kvserver/kvadmission/kvadmission.go line 339 at r1 (raw file):

		}
		// NB: When flow control is enabled for elastic work, and we encounter
		// non-elastic (i.e. "regular work), we skip waiting for tokens in

How about something like the following for this comment?

If flow control is disabled or if work bypasses flow control, we still subject it above-raft, leaseholder-only IO admission control

pkg/kv/kvserver/kvflowcontrol/kvflowcontrol.go line 118 at r1 (raw file):

	// until there are flow tokens available or the stream disconnects, subject to
	// context cancellation. It returns true if the request bypassed flow control.
	Admit(context.Context, admissionpb.WorkPriority, time.Time, ConnectedStream) (bool, error)

Invert the return type, returning true if admitted. Perhaps name the return type as "admitted" in this interface definition. Ditto below for the Handle interface. Write some comment about not needing to do anything with the first return type if err != nil.

I think calling code could look like:

admitted, err := c.Admit(...)

If !admitted and err == nil, that just means we didn't deduct tokens for it. There's a parallel for this form elsewhere:

func (q *WorkQueue) Admit(ctx context.Context, info WorkInfo) (enabled bool, err error) {


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller.go line 155 at r1 (raw file):

		// being available, we'll also let them through if we're not
		// applying flow control to their specific work class.
		bypassTokens = c.mode() == kvflowcontrol.ApplyToElastic && class == admissionpb.RegularWorkClass

Nit: s/bypassTokens/bypass.


pkg/kv/kvserver/kvflowcontrol/kvflowhandle/kvflowhandle.go line 98 at r1 (raw file):

		// for regular traffic, that can be dealt with at the caller by not
		// calling .Admit() and ensuring we use the right raft entry encodings.
		return bypassTokens, nil

Just return false here directly instead of relying on uninit-ed bool. Do the same in the closed case below.


pkg/kv/kvserver/kvflowcontrol/kvflowhandle/kvflowhandle.go line 126 at r1 (raw file):

		if err != nil {
			h.metrics.onErrored(class, h.clock.PhysicalTime().Sub(tstart))
			return bypassTokens, err

Return false directly in this error path. We expect the caller to just bail, right? Not bother proposing?

BTW - what are the semantics if different streams return different values for the boolean return type with no error? (If the mode cluster setting was changed partway through for example.) Right now we're just returning whatever the last stream[1] in this list evaluates, which I guess is fine, but worth an explicit comment

[1]: Why is it fine? In a sense it's the most recent read of the mode cluster setting. And we're not deducting tokens at this point - we only deduct it later when encoding the proposal, so we could decide based on what we saw last.

@aadityasondhi aadityasondhi force-pushed the 20230823.flowcontrol-apply-to-elastic-mode-fix branch 2 times, most recently from 8268652 to f2f6a07 Compare August 29, 2023 19:55
@aadityasondhi aadityasondhi marked this pull request as ready for review August 29, 2023 19:55
@aadityasondhi aadityasondhi requested a review from a team as a code owner August 29, 2023 19:55
@aadityasondhi aadityasondhi requested a review from a team August 29, 2023 19:55
@aadityasondhi aadityasondhi requested a review from a team as a code owner August 29, 2023 19:55
@aadityasondhi aadityasondhi removed request for a team August 29, 2023 19:55
@aadityasondhi aadityasondhi force-pushed the 20230823.flowcontrol-apply-to-elastic-mode-fix branch 2 times, most recently from 983955f to fe97dbd Compare August 29, 2023 21:35
Copy link
Collaborator Author

@aadityasondhi aadityasondhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR. I made some changes based on your comments; let me know if you have any other suggestions.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif)


-- commits line 7 at r1:

Previously, irfansharif (irfan sharif) wrote…

Could you add a sentence explaining exactly how? Something around the rate of regular work being not shaped by flow control/below-raft admission rates, but it consuming some memory in below-raft work queues, hence the OOM.

Done.


-- commits line 10 at r1:

Previously, irfansharif (irfan sharif) wrote…

s/skip/bypass

Done.


-- commits line 11 at r1:

Previously, irfansharif (irfan sharif) wrote…

s/above Raft/above-raft. It might be better to say "We still subject such work to above-raft IO admission control, if enabled."

Done.


-- commits line 13 at r1:

Previously, irfansharif (irfan sharif) wrote…

Capitalize "informs" and add a period at the end.

Done.


pkg/kv/kvserver/kvadmission/kvadmission.go line 312 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Could you pull out the bypassAdmission from this conditional, and s/if attemptFlowControl {/if attemptFlowControl && !bypassAdmission? I had to remind myself that the bypassAdmission parameter is something included in admission.WorkInfo below so it's ok to explicitly seek admission for (we in fact rely on it for token calculation purposes). It's worth a comment - apologies for not adding it myself earlier.

I'm not sure what exactly to include in this comment. What do you mean by "it's ok to seek admission for explicitly"?


pkg/kv/kvserver/kvadmission/kvadmission.go line 339 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

How about something like the following for this comment?

If flow control is disabled or if work bypasses flow control, we still subject it above-raft, leaseholder-only IO admission control

Done.


pkg/kv/kvserver/kvflowcontrol/kvflowcontrol.go line 118 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Invert the return type, returning true if admitted. Perhaps name the return type as "admitted" in this interface definition. Ditto below for the Handle interface. Write some comment about not needing to do anything with the first return type if err != nil.

I think calling code could look like:

admitted, err := c.Admit(...)

If !admitted and err == nil, that just means we didn't deduct tokens for it. There's a parallel for this form elsewhere:

func (q *WorkQueue) Admit(ctx context.Context, info WorkInfo) (enabled bool, err error) {

Done.


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller.go line 155 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Nit: s/bypassTokens/bypass.

Done.


pkg/kv/kvserver/kvflowcontrol/kvflowhandle/kvflowhandle.go line 98 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Just return false here directly instead of relying on uninit-ed bool. Do the same in the closed case below.

Done.


pkg/kv/kvserver/kvflowcontrol/kvflowhandle/kvflowhandle.go line 126 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Return false directly in this error path. We expect the caller to just bail, right? Not bother proposing?

BTW - what are the semantics if different streams return different values for the boolean return type with no error? (If the mode cluster setting was changed partway through for example.) Right now we're just returning whatever the last stream[1] in this list evaluates, which I guess is fine, but worth an explicit comment

[1]: Why is it fine? In a sense it's the most recent read of the mode cluster setting. And we're not deducting tokens at this point - we only deduct it later when encoding the proposal, so we could decide based on what we saw last.

Done.

@aadityasondhi aadityasondhi force-pushed the 20230823.flowcontrol-apply-to-elastic-mode-fix branch from fe97dbd to 9789da7 Compare August 30, 2023 14:12
Copy link
Contributor

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aadityasondhi)


pkg/kv/kvserver/kvadmission/kvadmission.go line 312 at r1 (raw file):

Previously, aadityasondhi (Aaditya Sondhi) wrote…

I'm not sure what exactly to include in this comment. What do you mean by "it's ok to seek admission for explicitly"?

I meant that even though we know at the caller we're bypassing admission, we still have to explicit invoke .Admit() but pass in this bypassAdmission = true through the WorkInfo`. We do it for correct token accounting, i.e. we deduct tokens without blocking.

Previously, when we were skipping flow control for regular work, we were
still encoding raft messages with AC encoding. This would enqueue
essentially no-op AC work items below raft, increasing the possibility
of OOM. This is because we would not be pacing the rate of regular work
by below-raft admission rates but it would be consuming memory in
below-raft work queues.

This change skips encoding raft messages with AC encoding for cases
where we bypass flow control. Additionally, we still subject such work
to above-raft IO admission control, if enabled.

Informs cockroachdb#104154.

Release note: None
@aadityasondhi aadityasondhi force-pushed the 20230823.flowcontrol-apply-to-elastic-mode-fix branch from 9789da7 to 2734eda Compare August 30, 2023 18:59
Copy link
Collaborator Author

@aadityasondhi aadityasondhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif)


pkg/kv/kvserver/kvadmission/kvadmission.go line 312 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

I meant that even though we know at the caller we're bypassing admission, we still have to explicit invoke .Admit() but pass in this bypassAdmission = true through the WorkInfo`. We do it for correct token accounting, i.e. we deduct tokens without blocking.

Thanks for clarifying, I added the comment.

@aadityasondhi
Copy link
Collaborator Author

bors r=irfansharif

@craig
Copy link
Contributor

craig bot commented Aug 30, 2023

Build succeeded:

@craig craig bot merged commit 95dcf06 into cockroachdb:master Aug 30, 2023
@aadityasondhi aadityasondhi deleted the 20230823.flowcontrol-apply-to-elastic-mode-fix branch August 31, 2023 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants