-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: grab raftMu during no-op writes with local gossip triggers #68045
kv: grab raftMu during no-op writes with local gossip triggers #68045
Conversation
pkg/kv/kvserver/replica_proposal.go
Outdated
@@ -641,7 +641,9 @@ func addSSTablePreApply( | |||
return copied | |||
} | |||
|
|||
func (r *Replica) handleReadWriteLocalEvalResult(ctx context.Context, lResult result.LocalResult) { | |||
func (r *Replica) handleReadWriteLocalEvalResult( | |||
ctx context.Context, lResult result.LocalResult, withRaftMu bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
withRaftMu
doesn't really tell me whether this is the caller asking for raftMu to be acquired, or whether they're promising that they already hold it. Consider calling this raftMuHeld
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
if lResult.MaybeGossipSystemConfig { | ||
defer maybeAcquireRaftMu()() | ||
if err := r.MaybeGossipSystemConfigRaftMuLocked(ctx); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you're here, sprinkle raftMu.AssertHeld()
into the three methods with the RaftMuLocked
suffix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already there. That's the cause of the test flakiness 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @adityamaru and @nvanbenschoten)
Looks like this is making it hard to merge stuff, can we land this? I see that CI here is red, so I'm hesitant to pull the trigger. |
Fixes cockroachdb#68011. As of 9f8c019, it is now possible to have no-op writes that do not go through Raft but do set one of the gossip triggers. These gossip triggers require the raftMu to be held, so we were running into trouble when handling the local eval results above Raft. For instance, we see this case when a transaction sets the system config trigger and then performs a delete range over an empty span before committing. In this case, the transaction will have no intents to remove, so it can auto-GC its record during an EndTxn. If its record was never written in the first place, this is a no-op (as of 9f8c019). There appear to be three ways we could solve this: 1. we can avoid setting gossip triggers on transactions that don't perform any writes. 2. we can force EndTxn requests with gossip triggers to go through Raft even if they are otherwise no-ops. 3. we can properly handle gossip triggers on the above Raft local eval result path. This commit opts for the third option.
66b6a88
to
584fb97
Compare
bors r+ |
Build succeeded: |
Fixes #68011.
As of 9f8c019, it is now possible to have no-op writes that do not go through
Raft but do set one of the gossip triggers. These gossip triggers require the
raftMu to be held, so we were running into trouble when handling the local
eval results above Raft.
For instance, we see this case when a transaction sets the system config
trigger and then performs a delete range over an empty span before
committing. In this case, the transaction will have no intents to
remove, so it can auto-GC its record during an EndTxn. If its record was
never written in the first place, this is a no-op (as of 9f8c019).
There appear to be three ways we could solve this:
any writes.
if they are otherwise no-ops.
path.
This commit opts for the third option.