Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: grab raftMu during no-op writes with local gossip triggers #68045

Merged

Conversation

nvanbenschoten
Copy link
Member

Fixes #68011.

As of 9f8c019, it is now possible to have no-op writes that do not go through
Raft but do set one of the gossip triggers. These gossip triggers require the
raftMu to be held, so we were running into trouble when handling the local
eval results above Raft.

For instance, we see this case when a transaction sets the system config
trigger and then performs a delete range over an empty span before
committing. In this case, the transaction will have no intents to
remove, so it can auto-GC its record during an EndTxn. If its record was
never written in the first place, this is a no-op (as of 9f8c019).

There appear to be three ways we could solve this:

  1. we can avoid setting gossip triggers on transactions that don't perform
    any writes.
  2. we can force EndTxn requests with gossip triggers to go through Raft even
    if they are otherwise no-ops.
  3. we can properly handle gossip triggers on the above Raft local eval result
    path.

This commit opts for the third option.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@@ -641,7 +641,9 @@ func addSSTablePreApply(
return copied
}

func (r *Replica) handleReadWriteLocalEvalResult(ctx context.Context, lResult result.LocalResult) {
func (r *Replica) handleReadWriteLocalEvalResult(
ctx context.Context, lResult result.LocalResult, withRaftMu bool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

withRaftMu doesn't really tell me whether this is the caller asking for raftMu to be acquired, or whether they're promising that they already hold it. Consider calling this raftMuHeld.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if lResult.MaybeGossipSystemConfig {
defer maybeAcquireRaftMu()()
if err := r.MaybeGossipSystemConfigRaftMuLocked(ctx); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you're here, sprinkle raftMu.AssertHeld() into the three methods with the RaftMuLocked suffix?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already there. That's the cause of the test flakiness 😃

Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 3 of 3 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @adityamaru and @nvanbenschoten)

@tbg
Copy link
Member

tbg commented Jul 26, 2021

Looks like this is making it hard to merge stuff, can we land this? I see that CI here is red, so I'm hesitant to pull the trigger.

Fixes cockroachdb#68011.

As of 9f8c019, it is now possible to have no-op writes that do not go through
Raft but do set one of the gossip triggers. These gossip triggers require the
raftMu to be held, so we were running into trouble when handling the local
eval results above Raft.

For instance, we see this case when a transaction sets the system config
trigger and then performs a delete range over an empty span before
committing. In this case, the transaction will have no intents to
remove, so it can auto-GC its record during an EndTxn. If its record was
never written in the first place, this is a no-op (as of 9f8c019).

There appear to be three ways we could solve this:
1. we can avoid setting gossip triggers on transactions that don't perform
   any writes.
2. we can force EndTxn requests with gossip triggers to go through Raft even
   if they are otherwise no-ops.
3. we can properly handle gossip triggers on the above Raft local eval result
   path.

This commit opts for the third option.
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/gossipTriggerRaft branch from 66b6a88 to 584fb97 Compare July 26, 2021 17:51
@nvanbenschoten
Copy link
Member Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jul 26, 2021

Build succeeded:

@craig craig bot merged commit 82b32cf into cockroachdb:master Jul 26, 2021
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/gossipTriggerRaft branch July 27, 2021 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

backupccl: CI builds panicking with mutex is not write locked
4 participants