Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: resurrect using_applied_state_key in ReplicaState #72239

Merged
merged 1 commit into from
Oct 29, 2021

Conversation

irfansharif
Copy link
Contributor

@irfansharif irfansharif commented Oct 29, 2021

Fixes #72029.

using_applied_state_key was previously used to check whether the Range
had upgraded to begin using the RangeAppliedState key. When set to true,
replicas receiving the state (through a ReplicatedEvalResult) knew to
begin using the new key. In #58088 (in 21.1) we introduced a migration
to iterate through all ranges in the system and have them start using
the RangeAppliedState key. In 21.2, this field was always set to true --
21.2 nodes were already using the RangeAppliedState key, or receiving
messages from 21.1 nodes that were also using the RangeAppliedState key.
In 21.2 (and in 21.1 for that matter) we didn't need to trigger the "if
set to true in an incoming message, start using the RangeAppliedState
key" code path.

When looking to get rid of this field in 22.1 (#70464), we observed that
there was an unintentional read of this field in 21.2 nodes (see #72029 and
#72222); the saga is as follows:

  • Removing this field in 22.1 meant it was set as false when received at
    21.2 nodes.
  • This should've been fine! We weren't using this field to trigger any
    upgrade code paths (like it was originally intended for).
  • It turns out that in 21.2 we were using the ReplicaState from the
    incoming snapshot to update our in-memory replica state
  • Because the proto field was being phased out, there was now a divergence
    between the on-disk state (field set to true, from earlier 21.2
    operations) and the in-memory state (field set to false, because sent
    from a version that attempted to get rid of this field).

Removing proto fields from the replica state are not possible until we stop
using the protobuf copy of the replica state when applying a snapshot
(#72222). Once that's done, we should be able to stop sending the replica
state as part of the snapshot in the subsequent release.

Release note: None

@irfansharif irfansharif requested a review from a team as a code owner October 29, 2021 16:56
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@irfansharif
Copy link
Contributor Author

Stepping out for a bit, will run the mixed version test again to see if this fixes it for good. Truly a migration from hell.

Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing from my phone, so haven't checked the surrounding code, but LGTM to fix the nightly tests. Will have a closer look later.

Fixes cockroachdb#72029.

using_applied_state_key was previously used to check whether the Range
had upgraded to begin using the RangeAppliedState key. When set to true,
replicas receiving the state (through a ReplicatedEvalResult) knew to
begin using the new key. In cockroachdb#58088 (in 21.1) we introduced a migration
to iterate through all ranges in the system and have them start using
the RangeAppliedState key. In 21.2, this field was always set to true --
21.2 nodes were already using the RangeAppliedState key, or receiving
messages from 21.1 nodes that were also using the RangeAppliedState key.
In 21.2 (and in 21.1 for that matter) we didn't need to trigger the "if
set to true in an incoming message, start using the RangeAppliedState
key" code path.

When looking to get rid of this field in 22.1 (cockroachdb#70464), we observed that
there was an unintentional read of this field in 21.2 nodes (see cockroachdb#72029 and
\cockroachdb#72222); the saga is as follows:
 - Removing this field in 22.1 meant it was set as false when received at
   21.2 nodes.
 - This should've been fine! We weren't using this field to trigger any
   upgrade code paths (like it was originally intended for).
 - It turns out that in 21.2 we were using the ReplicaState from the
   incoming snapshot to update our in-memory replica state
 - Because the proto field was being phased out, there was now a divergence
   between the on-disk state (field set to true, from earlier 21.2
   operations) and the in-memory state (field set to false, because sent
   from a version that attempted to get rid of this field).

Removing proto fields from the replica state are not possible until we stop
using the protobuf copy of the replica state when applying a snapshot
(cockroachdb#72222). Once that's done, we should be able to stop sending the replica
state as part of the snapshot in the subsequent release.

Release note: None
@irfansharif irfansharif force-pushed the 211029.using-applied-state branch from e9a5059 to 95e5b69 Compare October 29, 2021 17:33
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Truly a migration from hell.

Truly a migration from hell.

Reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @tbg)

@@ -2451,6 +2451,10 @@ func (r *Replica) sendSnapshot(
Term: snap.RaftSnap.Metadata.Term,
}

// See comment on DeprecatedUsingAppliedStateKey for why we need to set this
// explicitly for snapshots going out to followers.
snap.State.DeprecatedUsingAppliedStateKey = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you did this in stateloader.Load you wouldn't have to think about this anywhere else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, but I wanted this code to be close to the follower snapshot business.

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a comment to, I think, improve this but let's merge anyway to get master green.

@irfansharif
Copy link
Contributor Author

TFTRs!

--- PASS: version/mixed/nodes=5 (247.76s)
18:03:55 test_runner.go:515: [w0] test passed

bors r+

@craig
Copy link
Contributor

craig bot commented Oct 29, 2021

Build succeeded:

@craig craig bot merged commit c7432f6 into cockroachdb:master Oct 29, 2021
@irfansharif irfansharif deleted the 211029.using-applied-state branch October 29, 2021 23:26
tbg added a commit to tbg/cockroach that referenced this pull request Nov 2, 2021
Snapshots contain a serialized copy of the `ReplicaState`. However, the
snapshot itself contains that data already. Having two sources of data
that may be interpreted differently can lead to problems, as we saw in
[72239].

This commit deprecates using the entire ReplicaState. Instead, we pick
out the descriptor and ignore everything else. Instead of using the
copy of the state to initialize the recipient's in-memory state, we
now use a state loader.

In 22.2 we can only send the descriptor but maybe we won't do that; for
logging and debugging it's kind of nice to have everything present.

Fixes cockroachdb#72222.

[cockroachdb#72239]: cockroachdb#72239

Release note: None
tbg added a commit to tbg/cockroach that referenced this pull request Nov 2, 2021
Snapshots contain a serialized copy of the `ReplicaState`. However, the
snapshot itself contains that data already. Having two sources of data
that may be interpreted differently can lead to problems, as we saw in
[72239].

This commit deprecates using the entire ReplicaState. Instead, we pick
out the descriptor and ignore everything else. Instead of using the
copy of the state to initialize the recipient's in-memory state, we
now use a state loader.

In 22.2 we can only send the descriptor but maybe we won't do that; for
logging and debugging it's kind of nice to have everything present.

Fixes cockroachdb#72222.

[cockroachdb#72239]: cockroachdb#72239

Release note: None
craig bot pushed a commit that referenced this pull request Nov 3, 2021
69614: spanconfig: introduce spanconfig.KVSubscriber r=irfansharif a=irfansharif

KVSubscriber presents a consistent[^1] snapshot of a spanconfig.StoreReader that's incrementally maintained with changes made to the global span configurations state. The maintenance happens transparently; callers can subscribe to learn about what key spans may have seen a configuration change. After learning about a span update, consulting the embedded StoreReader would retrieve an up-to-date[^2] config for it.

When a callback is first installed, it's invoked with the [min,max) span -- a shorthand to indicate that callers should consult the StoreReader for all spans of interest. Subsequent updates are of the more incremental kind. It's possible that the span updates received are no-ops, i.e.  consulting the StoreReader for the given span would retrieve the last config observed for the span[^2].  

```go
type KVSubscriber interface {
  StoreReader
  Subscribe(func(updated roachpb.Span))
}
```

Internally we maintain a rangefeed over the global store of span configurations (system.span_configurations), applying updates from it into an embedded spanconfig.Store. A read-only view of this data structure (spanconfig.StoreReader) is exposed as part of the KVSubscriber interface. Rangefeeds used as is don't offer any ordering guarantees with respect to updates made over non-overlapping keys, which is something we care about[^4]. For that reason we make use of a rangefeed buffer, accumulating raw rangefeed updates and flushing them out en-masse in timestamp order when the rangefeed frontier is bumped[^5]. If the buffer overflows (as dictated by the memory limit the KVSubscriber is instantiated with), the subscriber is wound down and an appropriate error is returned to the caller.

When running into the errors above, it's safe for the caller to re-subscribe to effectively re-establish the underlying rangefeeds. When re-establishing a new rangefeed and populating a spanconfig.Store using the contents of the initial scan[^6], we wish to preserve the existing spanconfig.StoreReader.  Discarding it would entail either blocking all external readers until a new spanconfig.StoreReader was fully populated, or presenting an inconsistent view of the spanconfig.Store that's currently being populated. For new rangefeeds what we do then is route all updates from the initial scan to a fresh spanconfig.Store, and once the initial scan is done, swap at the source for the exported spanconfig.StoreReader. During the initial scan, concurrent readers would continue to observe the last spanconfig.StoreReader if any. After the swap, it would observe the more up-to-date source instead.  Future incremental updates will also target the new source. When this source swap occurs, we inform handlers of the need to possibly refresh their view of all configs.  

This commit also wires up the KVSubscriber into KV stores, replacing the use of the gossiped system config span (possible given the StoreReader interface, only happens if a testing flag/env var is set).  

[^1]: The contents of the StoreReader at t1 corresponds exactly to the contents of the global span configuration state at t0 where t0 <= t1. If the StoreReader is read from at t2 where t2 > t1, it's guaranteed to observe a view of the global state at t >= t0.
[^2]: For the canonical KVSubscriber implementation, this is typically the closed timestamp target duration.
[^3]: The canonical KVSubscriber implementation internally re-establishes feeds when errors occur, possibly re-transmitting earlier updates (usually through a lazy [min,max) span) despite possibly not needing to. We could do a bit better and diff the two data structures, emitting only targeted updates.
[^4]: For a given key k, it's config may be stored as part of a larger span S (where S.start <= k < S.end). It's possible for S to get deleted and replaced with sub-spans S1...SN in the same transaction if the span is getting split. When applying these updates, we need to make sure to process the deletion event for S before processing S1...SN.
[^5]: In our example above deleting the config for S and adding configs for S1...Nwe want to make sure that we apply the full set of updates all at once -- lest we expose the intermediate state where the config for S was deleted but the configs for S1...SN were not yet applied.
[^6]: When tearing down the subscriber due to underlying errors, we could also surface a checkpoint to use the next time the subscriber is established. That way we can avoid the full initial scan over the span configuration state and simply pick up where we left off with our existing spanconfig.Store.

Release note: None

71239: sql: improve historical descriptor look up efficiency r=jameswsj10 a=jameswsj10

Fixes #70692. The existing implementation for looking up old
historical descriptors required multiple round trips to storage.
This improvement requires only 1, at most 2, KV calls to storage
by using a single ExportRequest.

Release note (performance improvement): reduce kv calls when
looking up old historical descriptors to 1 or at most 2.

72314: kvserver: trim state used from snapshots r=erikgrinaker a=tbg

Snapshots contain a serialized copy of the `ReplicaState`. However, the
snapshot itself contains that data already. Having two sources of data
that may be interpreted differently can lead to problems, as we saw in
[#72239].

This commit deprecates using the entire ReplicaState. Instead, we pick
out the descriptor and ignore everything else. Instead of using the
copy of the state to initialize the recipient's in-memory state, we
now use a state loader.

In 22.2 we can only send the descriptor but maybe we won't do that; for
logging and debugging it's kind of nice to have everything present.

Fixes #72222.

[#72239]: #72239

Release note: None

72323: sql: fix excess privileges being created from default privileges. r=RichardJCai a=RichardJCai

Release note (bug fix): Previously, when creating an object
default privileges from users that were not the user creating
the object would be added to the privileges of the object.
This fix ensures only the relevant default privileges are applied.

Resolves #72322

72346: roachtest: fix gorm roachtest r=rafiss a=RichardJCai

Release note: None

Fixes #72023

72352: server,cli: fix improperly wrapped errors r=knz a=rafiss

refs #42510

I'm working on a linter that detects errors that are not wrapped
correctly, and it discovered these.

Release note: None

72366: roachtest: update ruby-pg blocklist for 22.1 r=rafiss a=ZhouXing19

Fixes #72316

Release note: None

72390: roachprod: avoid flaky test due to unused functions r=[RaduBerinde,stevendanna,rail] a=healthy-pod

Merging #71660 trigerred a flaky test due to unused functions.

This patch avoids that test by making use of / commenting out unused functions.

Release note: None

Co-authored-by: irfan sharif <[email protected]>
Co-authored-by: jameswsj10 <[email protected]>
Co-authored-by: Tobias Grieger <[email protected]>
Co-authored-by: richardjcai <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
Co-authored-by: Jane Xing <[email protected]>
Co-authored-by: Ahmad Abedalqader <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: version/mixed/nodes=5 failed
5 participants