Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rac2,kvserver: do not quiesce if send tokens held #132166

Merged
merged 6 commits into from
Oct 10, 2024

Conversation

pav-kv
Copy link
Collaborator

@pav-kv pav-kv commented Oct 8, 2024

This PR prevents range quiescence if RACv2 holds any send tokens for this range. Quiescence would prevent MsgApp pings which ensure that the leader reliably learns about the follower store admitting log entries, and causes it to release tokens accordingly. We do not want to end up holding tokens permanently.

Resolves #129581

@pav-kv pav-kv requested review from kvoli and sumeerbhola October 8, 2024 13:47
@pav-kv pav-kv requested review from a team as code owners October 8, 2024 13:47
@pav-kv
Copy link
Collaborator Author

pav-kv commented Oct 8, 2024

@sumeerbhola @kvoli The 4th commit is the interesting one. The rest is clean-ups.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@pav-kv pav-kv force-pushed the rac2-quiescence branch 3 times, most recently from cc28b13 to 8fb44bf Compare October 8, 2024 14:30
@pav-kv
Copy link
Collaborator Author

pav-kv commented Oct 8, 2024

TestFlowControlQuiescedRangeV2 is failing (the range never quiesces), need to fix. Upd: this was expected. Fixed.

@pav-kv pav-kv force-pushed the rac2-quiescence branch 2 times, most recently from 93aaa20 to 19238d9 Compare October 8, 2024 15:41
Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 2 of 2 files at r3, 9 of 9 files at r4.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @kvoli and @pav-kv)


pkg/kv/kvserver/replica_raft_quiesce.go line 446 at r3 (raw file):

	}
	slices.SortFunc(lagging, func(a, b livenesspb.Liveness) int {
		return cmp.Compare(a.NodeID, b.NodeID)

is this sorting for determinism in tests, since sorting by NodeID seems unnecessary?


pkg/kv/kvserver/replica_raft_quiesce.go line 216 at r4 (raw file):

	// TODO(pav-kv): this method is a one-off holding raftMu. It should be able to
	// do its job with only Replica.mu held.
	hasSendTokensRaftMuLocked() bool

If I am reading correctly, shouldReplicaQuiesce is only ever called with raftMu and Replica.mu held, so this seems unnecessary to fix. There is a performance cost to adding locking in RangeController where it currently manages with raftMu already being held (mutex locking shows up in cpu profiles), so I'd rather not add another Mutex.


pkg/kv/kvserver/replica_raft_quiesce.go line 284 at r4 (raw file):

//
// NOTE: The last 3 conditions are fairly, but not completely, overlapping.
func shouldReplicaQuiesce(

suggest adding RaftMuLockedReplicaMuLocked suffix to this method.


pkg/kv/kvserver/kvflowcontrol/rac2/range_controller.go line 1169 at r4 (raw file):

	// TODO(pav-kv): in Replica.tick, we can call MaybeSendPingsRaftMuLocked
	// first, which does the same checks. It can return whether any tokens are
	// held, and this signal can be then used to prevent quiescence.

Would adding this logic into MaybeSendPingsRaftMuLocked help with code clarity (in the kvserver code) or performance?
Just trying to understand why this should be a TODO.

@kvoli kvoli removed their request for review October 9, 2024 17:21
Copy link
Collaborator Author

@pav-kv pav-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @sumeerbhola)


pkg/kv/kvserver/replica_raft_quiesce.go line 446 at r3 (raw file):

Previously, sumeerbhola wrote…

is this sorting for determinism in tests, since sorting by NodeID seems unnecessary?

I think so, but prefer not removing the sort in this PR. Looking at paths that use this lagging slice in the callers, they end up being sent over the wire. I didn't find any usage of this proto field though... Might remove in another PR.


pkg/kv/kvserver/replica_raft_quiesce.go line 216 at r4 (raw file):

Previously, sumeerbhola wrote…

If I am reading correctly, shouldReplicaQuiesce is only ever called with raftMu and Replica.mu held, so this seems unnecessary to fix. There is a performance cost to adding locking in RangeController where it currently manages with raftMu already being held (mutex locking shows up in cpu profiles), so I'd rather not add another Mutex.

The fact that raftMu is held in tick is odd and may change. Looking at the tick code, it is not used for anything (now used for RACv2 check). I am just documenting the interface irregularity here, but there is a bigger thing behind it.

The fact that RACv2 needs to hold raftMu everywhere in Processer is thus also a lucky coincidence. We might need to revisit that at some point, because we should be needing raftMu only for raft IO.


pkg/kv/kvserver/replica_raft_quiesce.go line 284 at r4 (raw file):

Previously, sumeerbhola wrote…

suggest adding RaftMuLockedReplicaMuLocked suffix to this method.

Done.


pkg/kv/kvserver/kvflowcontrol/rac2/range_controller.go line 1169 at r4 (raw file):

Previously, sumeerbhola wrote…

Would adding this logic into MaybeSendPingsRaftMuLocked help with code clarity (in the kvserver code) or performance?
Just trying to understand why this should be a TODO.

Performance. We're doing the same loop over rc.replicasMap + locking in each state, so we could both send pings and check whether we can quiesce (none/any of holdsTokens returned false).

Clarified the comment.

@pav-kv
Copy link
Collaborator Author

pav-kv commented Oct 10, 2024

TFTR!

bors r=sumeerbhola

@pav-kv
Copy link
Collaborator Author

pav-kv commented Oct 10, 2024

bors cancel

@craig
Copy link
Contributor

craig bot commented Oct 10, 2024

Canceled.

@pav-kv
Copy link
Collaborator Author

pav-kv commented Oct 10, 2024

bors r=sumeerbhola

craig bot pushed a commit that referenced this pull request Oct 10, 2024
132166: rac2,kvserver: do not quiesce if send tokens held r=sumeerbhola a=pav-kv

This PR prevents range quiescence if RACv2 holds any send tokens for this range. Quiescence would prevent `MsgApp` pings which ensure that the leader reliably learns about the follower store admitting log entries, and causes it to release tokens accordingly. We do not want to end up holding tokens permanently.

Resolves #129581

132202: sql/schemachanger: clean up SequenceOwner elements during restore r=fqazi a=fqazi

Previously, when restoring a backup taken in middle of a DROP COLUMN, where a column had a sequence owner assigned, it was possible for the backup to be unrestorable. This would happen because the sequence reference would have been dropped in the plan, but the seqeunce owner element was still within the state. To address this, this test updates the rewrite logic to clean up any SequenceOwner elements which have the referenced sequence already removed.

Fixes: #130778

Release note (bug fix): Addressed a rare bug that could prevent backups taken during a DROP COLUMN operation with a sequence owner from restoring with the error: "rewriting descriptor ids: missing rewrite for <id> in SequenceOwner..."

Co-authored-by: Pavel Kalinnikov <[email protected]>
Co-authored-by: Faizan Qazi <[email protected]>
@craig
Copy link
Contributor

craig bot commented Oct 10, 2024

Build failed (retrying...):

craig bot pushed a commit that referenced this pull request Oct 10, 2024
132166: rac2,kvserver: do not quiesce if send tokens held r=sumeerbhola a=pav-kv

This PR prevents range quiescence if RACv2 holds any send tokens for this range. Quiescence would prevent `MsgApp` pings which ensure that the leader reliably learns about the follower store admitting log entries, and causes it to release tokens accordingly. We do not want to end up holding tokens permanently.

Resolves #129581

Co-authored-by: Pavel Kalinnikov <[email protected]>
@craig
Copy link
Contributor

craig bot commented Oct 10, 2024

Build failed:

Epic: none
Release note: none
Epic: none
Release note: none
Epic: none
Release note: none
This commit prevents range quiescence if RACv2 holds any send tokens for
this range. Quiescence would prevent MsgApp pings which ensure that the
leader reliably learns about the the follower store admitting log
entries, and causes it to release tokens accordingly. We do not want to
end up holding tokens permanently.

Epic: none
Release note: none
Epic: none
Release note: none
@pav-kv
Copy link
Collaborator Author

pav-kv commented Oct 10, 2024

One more test had to be fixed.

bors r=sumeerbhola

craig bot pushed a commit that referenced this pull request Oct 10, 2024
132129: roachtest: add slow disk perturbation test r=kvoli a=andrewbaptist

This change adds a new set of perturbation tests perturbation/*/slowDisk which tests slow disks. We have see support cases where slow disks can cause cluster level availability outages.

Epic: none

Release note: None

132166: rac2,kvserver: do not quiesce if send tokens held r=sumeerbhola a=pav-kv

This PR prevents range quiescence if RACv2 holds any send tokens for this range. Quiescence would prevent `MsgApp` pings which ensure that the leader reliably learns about the follower store admitting log entries, and causes it to release tokens accordingly. We do not want to end up holding tokens permanently.

Resolves #129581

132202: sql/schemachanger: clean up SequenceOwner elements during restore r=fqazi a=fqazi

Previously, when restoring a backup taken in middle of a DROP COLUMN, where a column had a sequence owner assigned, it was possible for the backup to be unrestorable. This would happen because the sequence reference would have been dropped in the plan, but the seqeunce owner element was still within the state. To address this, this test updates the rewrite logic to clean up any SequenceOwner elements which have the referenced sequence already removed.

Fixes: #130778

Release note (bug fix): Addressed a rare bug that could prevent backups taken during a DROP COLUMN operation with a sequence owner from restoring with the error: "rewriting descriptor ids: missing rewrite for <id> in SequenceOwner..."

Co-authored-by: Andrew Baptist <[email protected]>
Co-authored-by: Pavel Kalinnikov <[email protected]>
Co-authored-by: Faizan Qazi <[email protected]>
@craig
Copy link
Contributor

craig bot commented Oct 10, 2024

This PR was included in a batch that was canceled, it will be automatically retried

craig bot pushed a commit that referenced this pull request Oct 10, 2024
132129: roachtest: add slow disk perturbation test r=kvoli a=andrewbaptist

This change adds a new set of perturbation tests perturbation/*/slowDisk which tests slow disks. We have see support cases where slow disks can cause cluster level availability outages.

Epic: none

Release note: None

132166: rac2,kvserver: do not quiesce if send tokens held r=sumeerbhola a=pav-kv

This PR prevents range quiescence if RACv2 holds any send tokens for this range. Quiescence would prevent `MsgApp` pings which ensure that the leader reliably learns about the follower store admitting log entries, and causes it to release tokens accordingly. We do not want to end up holding tokens permanently.

Resolves #129581

132202: sql/schemachanger: clean up SequenceOwner elements during restore r=fqazi a=fqazi

Previously, when restoring a backup taken in middle of a DROP COLUMN, where a column had a sequence owner assigned, it was possible for the backup to be unrestorable. This would happen because the sequence reference would have been dropped in the plan, but the seqeunce owner element was still within the state. To address this, this test updates the rewrite logic to clean up any SequenceOwner elements which have the referenced sequence already removed.

Fixes: #130778

Release note (bug fix): Addressed a rare bug that could prevent backups taken during a DROP COLUMN operation with a sequence owner from restoring with the error: "rewriting descriptor ids: missing rewrite for <id> in SequenceOwner..."

Co-authored-by: Andrew Baptist <[email protected]>
Co-authored-by: Pavel Kalinnikov <[email protected]>
Co-authored-by: Faizan Qazi <[email protected]>
@craig
Copy link
Contributor

craig bot commented Oct 10, 2024

Build failed (retrying...):

@craig craig bot merged commit 188d9fe into cockroachdb:master Oct 10, 2024
22 of 23 checks passed
@pav-kv pav-kv deleted the rac2-quiescence branch October 10, 2024 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rac2: prevent range quiescing if admitted is lagging for any replica
3 participants