Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-21.1: kv: don't unquiesce uninitialized replicas #74186

Conversation

nvanbenschoten
Copy link
Member

@nvanbenschoten nvanbenschoten commented Dec 22, 2021

Backport 4/5 commits from #73362.

This does not include the last commit from that PR, as it was a refactor that skews.

This backport should wait on the backports to this branch for #74108 and #74124. I'll also let #73362 bake for another week on master before pressing the merge button.

/cc @cockroachdb/release


In a support issue, we saw that 10s of thousands of uninitialized replicas were being ticked regularly and creating a large amount of background work on a node, driving up CPU. This commit updates the Raft quiescence logic to disallow uninitialized replicas from being unquiesced and Tick()'ing themselves.

Keeping uninitialized replicas quiesced even in the presence of Raft traffic avoids wasted work. We could Tick() these replicas, but doing so is unnecessary because uninitialized replicas can never win elections, so there is no reason for them to ever call an election. In fact, uninitialized replicas do not even know who their peers are, so there would be no way for them to call an election or for them to send any other non-reactive message. As a result, all work performed by an uninitialized replica is reactive and in response to incoming messages (see processRequestQueue).

There are multiple ways for an uninitialized replica to be created and then abandoned, and we don't do a good job garbage collecting them at a later point (see #73424), so it is important that they are cheap. Keeping them quiesced instead of letting them unquiesce and tick every 200ms indefinitely avoids a meaningful amount of periodic work for each uninitialized replica.

Release notes (bug fix): uninitialized replicas that are abandoned after an unsuccessful snapshot no longer perform periodic background work, so they no longer have a non-negligible cost.

Release justification: avoids high CPU when snapshots are starved.

@nvanbenschoten nvanbenschoten requested a review from tbg December 22, 2021 01:17
@nvanbenschoten nvanbenschoten requested a review from a team as a code owner December 22, 2021 01:17
@blathers-crl
Copy link

blathers-crl bot commented Dec 22, 2021

Thanks for opening a backport.

Please check the backport criteria before merging:

  • Patches should only be created for serious issues or test-only changes.
  • Patches should not break backwards-compatibility.
  • Patches should change as little code as possible.
  • Patches should not change on-disk formats or node communication protocols.
  • Patches should not add new functionality.
  • Patches must not add, edit, or otherwise modify cluster versions; or add version gates.
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
  • There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
  • The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
  • New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters.
  • The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.

Add a brief release justification to the body of your PR to justify this backport.

Some other things to consider:

  • What did we do to ensure that a user that doesn’t know & care about this backport, has no idea that it happened?
  • Will this work in a cluster of mixed patch versions? Did we test that?
  • If a user upgrades a patch version, uses this feature, and then downgrades, what happens?

@cockroach-teamcity
Copy link
Member

This change is Reviewable

This is a small refactor that eliminates a call to `HLC.Now` on each replica
tick. Due to the issue fixed by the following commit, we were ticking 1000s of
uninitialized replicas in fast succession and as a result, these clock accesses
were a major source of mutex contention. We shouldn't be ticking uninitialized
replicas, but since we now know that this is an expensive part of Replica.tick,
we might as well make it cheaper.
A replica cannot unquiesce if its internalRaftGroup is nil, so this
condition was redundant but confusing.
In a [support issue](cockroachlabs/support#1340), we
saw that 10s of thousands of uninitialized replicas were being ticked regularly
and creating a large amount of background work on a node, driving up CPU. This
commit updates the Raft quiescence logic to disallow uninitialized replicas from
being unquiesced and Tick()'ing themselves.

Keeping uninitialized replicas quiesced even in the presence of Raft traffic
avoids wasted work. We could Tick() these replicas, but doing so is unnecessary
because uninitialized replicas can never win elections, so there is no reason
for them to ever call an election. In fact, uninitialized replicas do not even
know who their peers are, so there would be no way for them to call an election
or for them to send any other non-reactive message. As a result, all work
performed by an uninitialized replica is reactive and in response to incoming
messages (see processRequestQueue).

There are multiple ways for an uninitialized replica to be created and
then abandoned, and we don't do a good job garbage collecting them at a
later point (see cockroachdb#73424),
so it is important that they are cheap. Keeping them quiesced instead of
letting them unquiesce and tick every 200ms indefinitely avoids a
meaningful amount of periodic work for each uninitialized replica.

Release notes (bug fix): uninitialized replicas that are abandoned after an
unsuccessful snapshot no longer perform periodic background work, so they no
longer have a non-negligible cost.
@nvanbenschoten nvanbenschoten deleted the backport21.1-73362 branch June 6, 2022 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants