Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clusterversion,kv: remove old cluster versions corresponding to 20.2 and 21.1 #66544

Closed
9 tasks done
ajwerner opened this issue Jun 16, 2021 · 2 comments · Fixed by #69887
Closed
9 tasks done

clusterversion,kv: remove old cluster versions corresponding to 20.2 and 21.1 #66544

ajwerner opened this issue Jun 16, 2021 · 2 comments · Fixed by #69887
Assignees
Labels
branch-master Failures and bugs on the master branch. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@ajwerner
Copy link
Contributor

ajwerner commented Jun 16, 2021

Is your feature request related to a problem? Please describe.

Sub-issue of #65200 for versions that seem KV related.

  • NodeMembershipStatus
  • AbortSpanBytes
  • CPutInline
  • ReplicaVersions
  • SeparatedIntents
  • ClosedTimestampsRaftTransport
  • PriorReadSummaries
  • NonVotingReplicas
  • TruncatedAndRangeAppliedStateMigration, PostTruncatedAndRangeAppliedStateMigration
@ajwerner ajwerner added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jun 16, 2021
@postamar postamar added the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Aug 19, 2021
@blathers-crl
Copy link

blathers-crl bot commented Aug 19, 2021

Hi @postamar, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@postamar postamar added the branch-master Failures and bugs on the master branch. label Aug 19, 2021
@tbg tbg added GA-blocker and removed release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Aug 23, 2021
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Sep 2, 2021
Partially addresses cockroachdb#66544 by removing a cluster version and its associated
dependencies which, for any cluster whose version is at least 21.1, is certain
to be active.

Release justification: cluster version cleanup
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Sep 2, 2021
Partially addresses cockroachdb#66544 by removing a cluster version and its associated
dependencies which, for any cluster whose version is at least 21.1, is certain
to be active.

Release justification: cluster version cleanup
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Sep 2, 2021
Partially addresses cockroachdb#66544 by removing a cluster version and its associated
dependencies which, for any cluster whose version is at least 21.1, is certain
to be active.

Release justification: cluster version cleanup
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Sep 2, 2021
Partially addresses cockroachdb#66544 by removing a cluster version and its associated
dependencies which, for any cluster whose version is at least 21.1, is certain
to be active.

Release justification: cluster version cleanup
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Sep 2, 2021
Partially addresses cockroachdb#66544 by removing a cluster version and its associated
dependencies which, for any cluster whose version is at least 21.1, is certain
to be active.

Release justification: cluster version cleanup
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Sep 2, 2021
Partially addresses cockroachdb#66544 by removing a cluster version and its associated
dependencies which, for any cluster whose version is at least 21.1, is certain
to be active.

Release justification: cluster version cleanup
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Sep 2, 2021
Partially addresses cockroachdb#66544 by removing a cluster version and its associated
dependencies which, for any cluster whose version is at least 21.1, is certain
to be active.

Release justification: cluster version cleanup
@tbg tbg assigned irfansharif and unassigned irfansharif and nvanbenschoten Sep 2, 2021
craig bot pushed a commit that referenced this issue Sep 2, 2021
68652: kvserver: count draining nodes as live when computing quorum r=aayushshah15 a=aayushshah15

Similar to #67714

Draining nodes were considered non-live by the allocator when it made the
determination of whether a range could achieve quorum. This meant that, for
instance, on a cluster with a replication factor of 5, if we had 3 or more
nodes marked draining, we (with a high likelihood) wouldn't be able to
decommission _any other_ nodes from the cluster.

Furthermore, due to the same reason as above the system also would incorrectly
decide to not rebalance ranges that had more than a quorum of replicas on
draining nodes.

This patch fixes this problem by considering replicas on draining nodes as live
for the purposes of determining whether a range has quorum. This likely fixes a
subset of "stuck decommissioning" issues we've seen in the wild.

Follows from cockroachlabs/support#1105

Release justification: bug fix

Release note(bug fix): Previously, draining a quorum of nodes (i.e. >=2 if
replication factor is 3, >=3 if replication factor is 5, etc) would block the
subsequent decommissioning of any other nodes in the cluster. This patch fixes
this bug. Now, if the lowest replication factor of some zone in the cluster is
RF, operators should be able to safely drain up to RF-1 nodes simultaneously.

69115: mt_start_sql: enable enterprise features for multitenant sql servers r=JeffSwenson a=JeffSwenson

Enterprise features are controlled by the enterprise.license setting.
Currently this setting applies only to the host tenant cluster. This
change enables enterprise features for all tenant clusters. Enabling
enterprise features for all tenants is reasonable, because multi
tenant deployments are an enterprise feature.

Release note: None

69571: jobs: don't reset next_run when resuming active schedules r=pbardea a=pbardea

Consider an active schedule that was created with a specific first_run
time. The first_run would populate the next_run_time on the schedule.
The resumption of this schedule before it executed would re-evaluate the
next runtime based off the schedule's recurrence.

This commit changes the scheduling system to only recompute the next run
time on paused schedules.

Release justification: bug fix
Release note (bug fix): Fix a bug where resuming an active schedule
would always reset its next run time. This was sometimes undesirable
with schedules that had a first_run option specified.

69696: kvserver: stop transferring leases to replicas that may need snapshots r=aayushshah15 a=aayushshah15

This commit disallows the `replicateQueue` from initiating lease
transfers to replicas that may be in need of a raft snapshot. Note that
the `StoreRebalancer` already has a stronger form of this check since it
disallows lease transfers to replicas that are lagging behind the raft
leader (which includes the set of replicas that need a snapshot).

In cases where the raft leader is not the leaseholder, we disallow the
replicateQueue from any sort of lease transfer until leaseholdership and
leadership are collocated. We rely on calls to
`maybeTransferRaftLeadershipToLeaseholderLocked()` (called on every raft
tick) to make sure that such periods of leadership / leaseholdership
misalignment are ephemeral and rare.

Alternative to #63507

Release justification: bug fix

Resolves #61604

Release note (bug fix): Fixes a bug that can cause prolonged
unavailability due to lease transfer to a replica that may be in need of
a raft snapshot.

69727: kv: deflake TestPriorityRatchetOnAbortOrPush r=nvanbenschoten a=nvanbenschoten

Fixes #68584.

The test was flaky for the reasons described in #68584. There doesn't appear to
be an easy way to fix this behavior, and it's not clear how valuable doing so
even is given how little we rely on transaction priorities anymore, so the
commit just deflakes the test by rejecting them.

Release justification: deflaking a test.

69728: clusterversion,kv: remove old cluster versions corresponding to 20.2 and 21.1 r=nvanbenschoten a=nvanbenschoten

Fixes most of #66544.

This pull request removes (almost) all old cluster versions corresponding to 20.2 and 21.1 which fall under the KV group's responsibility.

The PR leaves one remaining "combo version" of `TruncatedAndRangeAppliedStateMigration` plus `PostTruncatedAndRangeAppliedStateMigration`, which we'll want to remove before closing #66544. I've left this for @irfansharif both because its removal is a little more involved than the others and because I figured that he deserves to reap the fruit of his labor and get to delete the code related to the replicated truncated state and missing RangeAppliedStateKeys. Making that possible was a hard-earned win.

Release justification: cluster version cleanup

Co-authored-by: Aayush Shah <[email protected]>
Co-authored-by: Jeff <[email protected]>
Co-authored-by: Paul Bardea <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
@tbg
Copy link
Member

tbg commented Sep 7, 2021

Going to remove blocker label, though it would be nice to do this.

@tbg tbg removed the GA-blocker label Sep 7, 2021
irfansharif added a commit to irfansharif/cockroach that referenced this issue Sep 7, 2021
Fixes cockroachdb#66544. We fully migrated away from the replicated truncated state
and the old range applied state keys in cockroachdb#58088. We're now guaranteed to
never see the legacy truncated+applied state representations.

Release justification: cluster version cleanup
Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Sep 7, 2021
Fixes cockroachdb#66544. We fully migrated away from the replicated truncated state
and the old range applied state keys (RaftAppliedIndexLegacyKey,
LeaseAppliedIndexLegacyKey, RangeStatsLegacyKey) in cockroachdb#58088. We're now
guaranteed to never see the legacy truncated+applied state
representations, making this code (finally) safe to delete.

Release justification: cluster version cleanup
Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Sep 7, 2021
Fixes cockroachdb#66544. We fully migrated away from the replicated truncated state
and the old range applied state keys (RaftAppliedIndexLegacyKey,
LeaseAppliedIndexLegacyKey, and RangeStatsLegacyKey) in cockroachdb#58088. We're
now guaranteed to never see the legacy truncated+applied state
representations, making this code (finally) safe to delete.

Release justification: cluster version cleanup
Release note: None

---

This turned out to be pretty hairy and invasive. I'll wait for a few CI
runs and inspect the diffs here a bit more closely. Given the surface
area here, I'm curious if we feel this is worth getting in during
stability. Some questions to consider:

- The various legacy keys that were removed were removed in their
  entirety. Our CLI to scan over store keys will no longer recognize these
  keys. Is that ok?
- I don't think we need any checks to make sure we're not re-using some
  of these keys in the future -- but I'd like other opinions.
- Looks like we re-purposed the LocalRaftTruncatedStateLegacySuffix for
  the unreplicated truncated key as well, so just renamed it more
  appropriately.
- v21.1 nodes have to interop with v20.2 nodes where the legacy
  representation was possible, and it identified such cases through
  booleans in proto messages (SnapshotRequest.Header.UseUnreplicatedTruncatedState
  for example). This means that we still need to explicitly set them in
  v21.2 code. Crucially we'll stop reading these fields in v21.2 code,
  letting us remove these fields entirely in v22.1.
- The treatment above does not extend to ReplicaState.UsingAppliedStateKey;
  that field was only being used to trigger an idempotent migration to
  start using the newer applied state key representation, not as a way
  to express that the replica should expect the legacy representation.
  Since cockroachdb#58088 has already migrated every range in the system, we no
  longer need to trigger anything, and this field can be removed.
irfansharif added a commit to irfansharif/cockroach that referenced this issue Sep 7, 2021
Fixes cockroachdb#66544. We fully migrated away from the replicated truncated state
and the old range applied state keys (RaftAppliedIndexLegacyKey,
LeaseAppliedIndexLegacyKey, and RangeStatsLegacyKey) in cockroachdb#58088. We're
now guaranteed to never see the legacy truncated+applied state
representations, making this code (finally) safe to delete.

Release justification: cluster version cleanup
Release note: None

---

This turned out to be pretty hairy and invasive. I'll wait for a few CI
runs and inspect the diffs here a bit more closely. Given the surface
area here, I'm curious if we feel this is worth getting in during
stability. Some questions to consider:

- The various legacy keys that were removed were removed in their
  entirety. Our CLI to scan over store keys will no longer recognize these
  keys. Is that ok?
- I don't think we need any checks to make sure we're not re-using some
  of these keys in the future -- but I'd like other opinions.
- Looks like we re-purposed the LocalRaftTruncatedStateLegacySuffix for
  the unreplicated truncated key as well, so just renamed it more
  appropriately.
- v21.1 nodes have to interop with v20.2 nodes where the legacy
  representation was possible, and it identified such cases through
  booleans in proto messages (SnapshotRequest.Header.UseUnreplicatedTruncatedState
  for example). This means that we still need to explicitly set them in
  v21.2 code. Crucially we'll stop reading these fields in v21.2 code,
  letting us remove these fields entirely in v22.1.
- The treatment above does not extend to ReplicaState.UsingAppliedStateKey;
  that field was only being used to trigger an idempotent migration to
  start using the newer applied state key representation, not as a way
  to express that the replica should expect the legacy representation.
  Since cockroachdb#58088 has already migrated every range in the system, we no
  longer need to trigger anything, and this field can be removed.
irfansharif added a commit to irfansharif/cockroach that referenced this issue Sep 13, 2021
Fixes cockroachdb#66544.
Fixes cockroachdb#66834.

We fully migrated away from the replicated truncated state
and the old range applied state keys (RaftAppliedIndexLegacyKey,
LeaseAppliedIndexLegacyKey, and RangeStatsLegacyKey) in cockroachdb#58088. We're
now guaranteed to never see the legacy truncated+applied state
representations, making this code (finally) safe to delete.

Release justification: cluster version cleanup
Release note: None

---

This turned out to be pretty hairy and invasive. I'll wait for a few CI
runs and inspect the diffs here a bit more closely. Given the surface
area here, I'm curious if we feel this is worth getting in during
stability. Some questions to consider:

- The various legacy keys that were removed were removed in their
  entirety. Our CLI to scan over store keys will no longer recognize these
  keys. Is that ok?
- I don't think we need any checks to make sure we're not re-using some
  of these keys in the future -- but I'd like other opinions.
- Looks like we re-purposed the LocalRaftTruncatedStateLegacySuffix for
  the unreplicated truncated key as well, so just renamed it more
  appropriately.
- v21.1 nodes have to interop with v20.2 nodes where the legacy
  representation was possible, and it identified such cases through
  booleans in proto messages (SnapshotRequest.Header.UseUnreplicatedTruncatedState
  for example). This means that we still need to explicitly set them in
  v21.2 code. Crucially we'll stop reading these fields in v21.2 code,
  letting us remove these fields entirely in v22.1.
- The treatment above does not extend to ReplicaState.UsingAppliedStateKey;
  that field was only being used to trigger an idempotent migration to
  start using the newer applied state key representation, not as a way
  to express that the replica should expect the legacy representation.
  Since cockroachdb#58088 has already migrated every range in the system, we no
  longer need to trigger anything, and this field can be removed.
craig bot pushed a commit that referenced this issue Sep 14, 2021
67737: opt: zigzag scan hints r=cucaroach a=cucaroach

Fixes #35814

Enable table@{FORCE_ZIGZAG} and table@{FORCE_ZIGZAG=index} hints to lower the cost
of zigzag plans to that they will be preferred over other plans.

Note that this is a breaking change, if customer has an index called
FORCE_ZIGZAG and is hinting with table@{FORCE_ZIGZAG} that will no longer do what
it did before. Not sure what the ramifications of that are.

Release note (sql change): Extend index scan hint to allow zigzag joins to be preferred.


69887: kv,migration: rm code handling legacy raft truncated state r=irfansharif a=irfansharif

Fixes #66544. 
Fixes #66834.

We fully migrated away from the replicated truncated state
and the old range applied state keys (RaftAppliedIndexLegacyKey,
LeaseAppliedIndexLegacyKey, and RangeStatsLegacyKey) in #58088. We're
now guaranteed to never see the legacy truncated+applied state
representations, making this code (finally) safe to delete.

Release justification: cluster version cleanup
Release note: None

---

This turned out to be pretty hairy and invasive. I'll wait for a few CI
runs and inspect the diffs here a bit more closely. Given the surface
area here, I'm curious if we feel this is worth getting in during
stability. Some questions to consider:

- The various legacy keys that were removed were removed in their
  entirety. Our CLI to scan over store keys will no longer recognize these
  keys. Is that ok?
- I don't think we need any checks to make sure we're not re-using some
  of these keys in the future -- but I'd like other opinions.
- Looks like we re-purposed the LocalRaftTruncatedStateLegacySuffix for
  the unreplicated truncated key as well, so just renamed it more
  appropriately.
- v21.1 nodes have to interop with v20.2 nodes where the legacy
  representation was possible, and it identified such cases through
  booleans in proto messages (SnapshotRequest.Header.UseUnreplicatedTruncatedState
  for example). This means that we still need to explicitly set them in
  v21.2 code. Crucially we'll stop reading these fields in v21.2 code,
  letting us remove these fields entirely in v22.1.
- The treatment above does not extend to ReplicaState.UsingAppliedStateKey;
  that field was only being used to trigger an idempotent migration to
  start using the newer applied state key representation, not as a way
  to express that the replica should expect the legacy representation.
  Since [#58088](#58088) has already migrated every range in the system, we no
  longer need to trigger anything, and this field can be removed.

70059: roachtest: make --workload optional r=mgartner a=mgartner

This commit makes the `--workload` flag optional because some roachtests
do not require a workload binary.

Release note: None

70120: opt: match ScalarGroupBy in EliminateIndexJoinOrProjectInsideGroupBy r=mgartner a=mgartner

This commit extends EliminateIndexJoinOrProjectInsideGroupBy so that
ScalarGroupBy expressions are also matched. This allows the rule to
eliminate unnecessary index joins in more cases.

The primary motivation for this change was to make partial index
validation queries more efficient. These queries always have
ScalarGroupBy expressions because they are in the form:

    SELECT count(1) FROM table@partial_index WHERE predicate

Prior to this change, an index join was planned for these queries which
would operate on every row in the partial index. This could be extremely
expensive for large partial indexes.

Fixes #70116

Release note (performance improvement): A limitation has been fixed that
made creating partial indexes inefficient.

70162: ui: fix default behaviours of columns on stmt page on cc console r=maryliag a=maryliag

When a CC Console user open the Statement page for the first time
(no cache was created for column selector), this commits make sure
that the default columns will be displayed.

Fixes: #70160

Release justification: Category 4
Release note (bug fix): Default columns being displayed on Statements
page on CC console when the user never made any selection

Co-authored-by: Tommy Reilly <[email protected]>
Co-authored-by: irfan sharif <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Marylia Gutierrez <[email protected]>
@craig craig bot closed this as completed in 6464de2 Sep 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants