Skip to content

Commit

Permalink
kvserver: reintroduce RangeDesc.GenerationComparable
Browse files Browse the repository at this point in the history
We dropped this field recently, but unfortunately that wasn't safe for
mixed-version clusters. The rub is that 20.1 nodes need to roundtrip the
proto through 20.2 nodes in a fairly subtle way. When it comes back to
the 20.1 node, the descriptor needs to compare Equal() to the original.
We configure our protos to not preserve unrecognized fields, so removing
the field breaks this round-tripping.

Specifically, the scenario which broke is the following:
1. A 20.1 node processes an AdminSplit, and performs the tranction
   writing the new descriptors. The descriptors have the
   GenerationCompable field set.
2. The lease changes while the respective txn is running. The lease
   moves to a 20.2 node.
3. The 20.2 node evaluates the EndTxn, and computes the split trigger
   info that's going to be replicated. The EndTxn has split info in it
   containing the field set, but the field is dropped when converting
   that into the proposed SplitTrigger (since the 20.2 unmarshalls and
   re-marshalls the descriptors).
4. All the 20.1 replicas of the ranges involved now apply the respective
   trigger via Raft, and their in-memory state doesn't have the field
   set. This doesn't match the bytes written in the database, which have
   the field.
5. The discrepancy between the in-memory state and the db state is a
   problem, as it causes the 20.1 node to spin if it tries to perform
   subsequent merge/split operations. The reason is that the code
   performing these operations short-circuits itself if it detects that
   the descriptor has changed while the operation was running. This
   detection is performed via the generated Equals() method, and it
   mis-fires because of the phantom field. That detection happens here:
   https://github.com/cockroachdb/cockroach/blob/79c01d28da9c379f67bb41beef3d85ad3bee1da1/pkg/kv/kvserver/replica_command.go#L1957

This patch takes precautions so that we can remove the field again in
21.1 - I'm merging this in 21.1, I'll backport it to 20.2, and then I'll
come back to 20.2 and remove the field. Namely, the patch changes
RangeDesc.Equal() to ignore that field (and the method is no longer
generated).

Fixes cockroachdb#53535

Release note: None
  • Loading branch information
andreimatei committed Sep 16, 2020
1 parent ae09e7e commit 2d6a18c
Show file tree
Hide file tree
Showing 9 changed files with 301 additions and 183 deletions.
47 changes: 38 additions & 9 deletions c-deps/libroach/protos/roachpb/metadata.pb.cc

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

34 changes: 34 additions & 0 deletions c-deps/libroach/protos/roachpb/metadata.pb.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion pkg/kv/kvclient/kvcoord/range_cache.go
Original file line number Diff line number Diff line change
Expand Up @@ -1179,7 +1179,7 @@ func compareEntryDescs(a, b *rangeCacheEntry) int {
}
}

if a.desc.Equal(b.desc) {
if a.desc.Equal(&b.desc) {
return 0
}

Expand Down
2 changes: 1 addition & 1 deletion pkg/kv/kvserver/client_replica_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1854,7 +1854,7 @@ func TestSystemZoneConfigs(t *testing.T) {
if len(desc.Replicas().Learners()) > 0 {
return false, fmt.Errorf("descriptor contains learners: %v", desc)
}
if existing, ok := replicas[desc.RangeID]; ok && !existing.Equal(desc) {
if existing, ok := replicas[desc.RangeID]; ok && !existing.Equal(&desc) {
return false, fmt.Errorf("mismatch between\n%s\n%s", &existing, &desc)
}
replicas[desc.RangeID] = desc
Expand Down
3 changes: 1 addition & 2 deletions pkg/kv/kvserver/replica_command.go
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ func maybeDescriptorChangedError(
if !detail.ActualValue.IsPresent() {
return true, nil
} else if err := detail.ActualValue.GetProto(&actualDesc); err == nil &&
desc.RangeID == actualDesc.RangeID && !desc.Equal(actualDesc) {
desc.RangeID == actualDesc.RangeID && !desc.Equal(&actualDesc) {
return true, &actualDesc
}
}
Expand Down Expand Up @@ -1993,7 +1993,6 @@ func checkDescsEqual(desc *roachpb.RangeDescriptor) func(*roachpb.RangeDescripto
if desc2 != nil {
desc2.Replicas() // for sorting side-effect
}

return desc.Equal(desc2)
}
}
Expand Down
2 changes: 1 addition & 1 deletion pkg/kv/kvserver/replica_gc_queue.go
Original file line number Diff line number Diff line change
Expand Up @@ -335,7 +335,7 @@ func (rgcq *replicaGCQueue) process(
if len(rs) != 1 {
return false, errors.Errorf("expected 1 range descriptor, got %d", len(rs))
}
if leftReplyDesc := &rs[0]; !leftDesc.Equal(*leftReplyDesc) {
if leftReplyDesc := &rs[0]; !leftDesc.Equal(leftReplyDesc) {
log.VEventf(ctx, 1, "left neighbor %s not up-to-date with meta descriptor %s; cannot safely GC range yet",
leftDesc, leftReplyDesc)
// Chances are that the left replica needs to be GC'd. Since we don't
Expand Down
39 changes: 39 additions & 0 deletions pkg/roachpb/metadata.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
package roachpb

import (
"bytes"
"fmt"
"sort"
"strconv"
Expand Down Expand Up @@ -145,6 +146,44 @@ func NewRangeDescriptor(
return desc
}

// Equal compares two descriptors for equality. This was copied over from the
// gogoproto generated version in order to ignore deprecated fields.
func (r *RangeDescriptor) Equal(other *RangeDescriptor) bool {
if other == nil {
return r == nil
}
if r == nil {
return false
}
if r.RangeID != other.RangeID {
return false
}
if r.Generation != other.Generation {
return false
}
if !bytes.Equal(r.StartKey, other.StartKey) {
return false
}
if !bytes.Equal(r.EndKey, other.EndKey) {
return false
}
if len(r.InternalReplicas) != len(other.InternalReplicas) {
return false
}
for i := range r.InternalReplicas {
if !r.InternalReplicas[i].Equal(&other.InternalReplicas[i]) {
return false
}
}
if r.NextReplicaID != other.NextReplicaID {
return false
}
if !r.StickyBit.Equal(other.StickyBit) {
return false
}
return true
}

// RSpan returns the RangeDescriptor's resolved span.
func (r *RangeDescriptor) RSpan() RSpan {
return RSpan{Key: r.StartKey, EndKey: r.EndKey}
Expand Down
Loading

0 comments on commit 2d6a18c

Please sign in to comment.