clusterversion,kv: remove old cluster versions corresponding to 20.2 and 21.1 #66544

ajwerner · 2021-06-16T16:20:40Z

Is your feature request related to a problem? Please describe.

Sub-issue of #65200 for versions that seem KV related.

blathers-crl · 2021-08-19T14:40:11Z

Hi @postamar, please add branch-* labels to identify which branch(es) this release-blocker affects.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

Partially addresses cockroachdb#66544 by removing a cluster version and its associated dependencies which, for any cluster whose version is at least 21.1, is certain to be active. Release justification: cluster version cleanup

@irfansharif

68652: kvserver: count draining nodes as live when computing quorum r=aayushshah15 a=aayushshah15 Similar to #67714 Draining nodes were considered non-live by the allocator when it made the determination of whether a range could achieve quorum. This meant that, for instance, on a cluster with a replication factor of 5, if we had 3 or more nodes marked draining, we (with a high likelihood) wouldn't be able to decommission _any other_ nodes from the cluster. Furthermore, due to the same reason as above the system also would incorrectly decide to not rebalance ranges that had more than a quorum of replicas on draining nodes. This patch fixes this problem by considering replicas on draining nodes as live for the purposes of determining whether a range has quorum. This likely fixes a subset of "stuck decommissioning" issues we've seen in the wild. Follows from cockroachlabs/support#1105 Release justification: bug fix Release note(bug fix): Previously, draining a quorum of nodes (i.e. >=2 if replication factor is 3, >=3 if replication factor is 5, etc) would block the subsequent decommissioning of any other nodes in the cluster. This patch fixes this bug. Now, if the lowest replication factor of some zone in the cluster is RF, operators should be able to safely drain up to RF-1 nodes simultaneously. 69115: mt_start_sql: enable enterprise features for multitenant sql servers r=JeffSwenson a=JeffSwenson Enterprise features are controlled by the enterprise.license setting. Currently this setting applies only to the host tenant cluster. This change enables enterprise features for all tenant clusters. Enabling enterprise features for all tenants is reasonable, because multi tenant deployments are an enterprise feature. Release note: None 69571: jobs: don't reset next_run when resuming active schedules r=pbardea a=pbardea Consider an active schedule that was created with a specific first_run time. The first_run would populate the next_run_time on the schedule. The resumption of this schedule before it executed would re-evaluate the next runtime based off the schedule's recurrence. This commit changes the scheduling system to only recompute the next run time on paused schedules. Release justification: bug fix Release note (bug fix): Fix a bug where resuming an active schedule would always reset its next run time. This was sometimes undesirable with schedules that had a first_run option specified. 69696: kvserver: stop transferring leases to replicas that may need snapshots r=aayushshah15 a=aayushshah15 This commit disallows the `replicateQueue` from initiating lease transfers to replicas that may be in need of a raft snapshot. Note that the `StoreRebalancer` already has a stronger form of this check since it disallows lease transfers to replicas that are lagging behind the raft leader (which includes the set of replicas that need a snapshot). In cases where the raft leader is not the leaseholder, we disallow the replicateQueue from any sort of lease transfer until leaseholdership and leadership are collocated. We rely on calls to `maybeTransferRaftLeadershipToLeaseholderLocked()` (called on every raft tick) to make sure that such periods of leadership / leaseholdership misalignment are ephemeral and rare. Alternative to #63507 Release justification: bug fix Resolves #61604 Release note (bug fix): Fixes a bug that can cause prolonged unavailability due to lease transfer to a replica that may be in need of a raft snapshot. 69727: kv: deflake TestPriorityRatchetOnAbortOrPush r=nvanbenschoten a=nvanbenschoten Fixes #68584. The test was flaky for the reasons described in #68584. There doesn't appear to be an easy way to fix this behavior, and it's not clear how valuable doing so even is given how little we rely on transaction priorities anymore, so the commit just deflakes the test by rejecting them. Release justification: deflaking a test. 69728: clusterversion,kv: remove old cluster versions corresponding to 20.2 and 21.1 r=nvanbenschoten a=nvanbenschoten Fixes most of #66544. This pull request removes (almost) all old cluster versions corresponding to 20.2 and 21.1 which fall under the KV group's responsibility. The PR leaves one remaining "combo version" of `TruncatedAndRangeAppliedStateMigration` plus `PostTruncatedAndRangeAppliedStateMigration`, which we'll want to remove before closing #66544. I've left this for @irfansharif both because its removal is a little more involved than the others and because I figured that he deserves to reap the fruit of his labor and get to delete the code related to the replicated truncated state and missing RangeAppliedStateKeys. Making that possible was a hard-earned win. Release justification: cluster version cleanup Co-authored-by: Aayush Shah <[email protected]> Co-authored-by: Jeff <[email protected]> Co-authored-by: Paul Bardea <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]>

tbg · 2021-09-07T14:27:21Z

Going to remove blocker label, though it would be nice to do this.

Fixes cockroachdb#66544. We fully migrated away from the replicated truncated state and the old range applied state keys in cockroachdb#58088. We're now guaranteed to never see the legacy truncated+applied state representations. Release justification: cluster version cleanup Release note: None

Fixes cockroachdb#66544. We fully migrated away from the replicated truncated state and the old range applied state keys (RaftAppliedIndexLegacyKey, LeaseAppliedIndexLegacyKey, RangeStatsLegacyKey) in cockroachdb#58088. We're now guaranteed to never see the legacy truncated+applied state representations, making this code (finally) safe to delete. Release justification: cluster version cleanup Release note: None

Fixes cockroachdb#66544. We fully migrated away from the replicated truncated state and the old range applied state keys (RaftAppliedIndexLegacyKey, LeaseAppliedIndexLegacyKey, and RangeStatsLegacyKey) in cockroachdb#58088. We're now guaranteed to never see the legacy truncated+applied state representations, making this code (finally) safe to delete. Release justification: cluster version cleanup Release note: None --- This turned out to be pretty hairy and invasive. I'll wait for a few CI runs and inspect the diffs here a bit more closely. Given the surface area here, I'm curious if we feel this is worth getting in during stability. Some questions to consider: - The various legacy keys that were removed were removed in their entirety. Our CLI to scan over store keys will no longer recognize these keys. Is that ok? - I don't think we need any checks to make sure we're not re-using some of these keys in the future -- but I'd like other opinions. - Looks like we re-purposed the LocalRaftTruncatedStateLegacySuffix for the unreplicated truncated key as well, so just renamed it more appropriately. - v21.1 nodes have to interop with v20.2 nodes where the legacy representation was possible, and it identified such cases through booleans in proto messages (SnapshotRequest.Header.UseUnreplicatedTruncatedState for example). This means that we still need to explicitly set them in v21.2 code. Crucially we'll stop reading these fields in v21.2 code, letting us remove these fields entirely in v22.1. - The treatment above does not extend to ReplicaState.UsingAppliedStateKey; that field was only being used to trigger an idempotent migration to start using the newer applied state key representation, not as a way to express that the replica should expect the legacy representation. Since cockroachdb#58088 has already migrated every range in the system, we no longer need to trigger anything, and this field can be removed.

Fixes cockroachdb#66544. Fixes cockroachdb#66834. We fully migrated away from the replicated truncated state and the old range applied state keys (RaftAppliedIndexLegacyKey, LeaseAppliedIndexLegacyKey, and RangeStatsLegacyKey) in cockroachdb#58088. We're now guaranteed to never see the legacy truncated+applied state representations, making this code (finally) safe to delete. Release justification: cluster version cleanup Release note: None --- This turned out to be pretty hairy and invasive. I'll wait for a few CI runs and inspect the diffs here a bit more closely. Given the surface area here, I'm curious if we feel this is worth getting in during stability. Some questions to consider: - The various legacy keys that were removed were removed in their entirety. Our CLI to scan over store keys will no longer recognize these keys. Is that ok? - I don't think we need any checks to make sure we're not re-using some of these keys in the future -- but I'd like other opinions. - Looks like we re-purposed the LocalRaftTruncatedStateLegacySuffix for the unreplicated truncated key as well, so just renamed it more appropriately. - v21.1 nodes have to interop with v20.2 nodes where the legacy representation was possible, and it identified such cases through booleans in proto messages (SnapshotRequest.Header.UseUnreplicatedTruncatedState for example). This means that we still need to explicitly set them in v21.2 code. Crucially we'll stop reading these fields in v21.2 code, letting us remove these fields entirely in v22.1. - The treatment above does not extend to ReplicaState.UsingAppliedStateKey; that field was only being used to trigger an idempotent migration to start using the newer applied state key representation, not as a way to express that the replica should expect the legacy representation. Since cockroachdb#58088 has already migrated every range in the system, we no longer need to trigger anything, and this field can be removed.

67737: opt: zigzag scan hints r=cucaroach a=cucaroach Fixes #35814 Enable table@{FORCE_ZIGZAG} and table@{FORCE_ZIGZAG=index} hints to lower the cost of zigzag plans to that they will be preferred over other plans. Note that this is a breaking change, if customer has an index called FORCE_ZIGZAG and is hinting with table@{FORCE_ZIGZAG} that will no longer do what it did before. Not sure what the ramifications of that are. Release note (sql change): Extend index scan hint to allow zigzag joins to be preferred. 69887: kv,migration: rm code handling legacy raft truncated state r=irfansharif a=irfansharif Fixes #66544. Fixes #66834. We fully migrated away from the replicated truncated state and the old range applied state keys (RaftAppliedIndexLegacyKey, LeaseAppliedIndexLegacyKey, and RangeStatsLegacyKey) in #58088. We're now guaranteed to never see the legacy truncated+applied state representations, making this code (finally) safe to delete. Release justification: cluster version cleanup Release note: None --- This turned out to be pretty hairy and invasive. I'll wait for a few CI runs and inspect the diffs here a bit more closely. Given the surface area here, I'm curious if we feel this is worth getting in during stability. Some questions to consider: - The various legacy keys that were removed were removed in their entirety. Our CLI to scan over store keys will no longer recognize these keys. Is that ok? - I don't think we need any checks to make sure we're not re-using some of these keys in the future -- but I'd like other opinions. - Looks like we re-purposed the LocalRaftTruncatedStateLegacySuffix for the unreplicated truncated key as well, so just renamed it more appropriately. - v21.1 nodes have to interop with v20.2 nodes where the legacy representation was possible, and it identified such cases through booleans in proto messages (SnapshotRequest.Header.UseUnreplicatedTruncatedState for example). This means that we still need to explicitly set them in v21.2 code. Crucially we'll stop reading these fields in v21.2 code, letting us remove these fields entirely in v22.1. - The treatment above does not extend to ReplicaState.UsingAppliedStateKey; that field was only being used to trigger an idempotent migration to start using the newer applied state key representation, not as a way to express that the replica should expect the legacy representation. Since [#58088](#58088) has already migrated every range in the system, we no longer need to trigger anything, and this field can be removed. 70059: roachtest: make --workload optional r=mgartner a=mgartner This commit makes the `--workload` flag optional because some roachtests do not require a workload binary. Release note: None 70120: opt: match ScalarGroupBy in EliminateIndexJoinOrProjectInsideGroupBy r=mgartner a=mgartner This commit extends EliminateIndexJoinOrProjectInsideGroupBy so that ScalarGroupBy expressions are also matched. This allows the rule to eliminate unnecessary index joins in more cases. The primary motivation for this change was to make partial index validation queries more efficient. These queries always have ScalarGroupBy expressions because they are in the form: SELECT count(1) FROM table@partial_index WHERE predicate Prior to this change, an index join was planned for these queries which would operate on every row in the partial index. This could be extremely expensive for large partial indexes. Fixes #70116 Release note (performance improvement): A limitation has been fixed that made creating partial indexes inefficient. 70162: ui: fix default behaviours of columns on stmt page on cc console r=maryliag a=maryliag When a CC Console user open the Statement page for the first time (no cache was created for column selector), this commits make sure that the default columns will be displayed. Fixes: #70160 Release justification: Category 4 Release note (bug fix): Default columns being displayed on Statements page on CC console when the user never made any selection Co-authored-by: Tommy Reilly <[email protected]> Co-authored-by: irfan sharif <[email protected]> Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Marylia Gutierrez <[email protected]>

ajwerner added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jun 16, 2021

postamar added the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Aug 19, 2021

postamar mentioned this issue Aug 19, 2021

clusterversion,*: remove all cluster versions prior to Start21_2 #69144

Closed

postamar added the branch-master Failures and bugs on the master branch. label Aug 19, 2021

tbg added GA-blocker and removed release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Aug 23, 2021

nvanbenschoten mentioned this issue Sep 2, 2021

clusterversion,kv: remove old cluster versions corresponding to 20.2 and 21.1 #69728

Merged

nvanbenschoten assigned nvanbenschoten and irfansharif Sep 2, 2021

tbg assigned irfansharif and unassigned irfansharif and nvanbenschoten Sep 2, 2021

tbg removed the GA-blocker label Sep 7, 2021

irfansharif mentioned this issue Sep 7, 2021

kv,migration: rm code handling legacy raft truncated state #69887

Merged

craig bot closed this as completed in 6464de2 Sep 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clusterversion,kv: remove old cluster versions corresponding to 20.2 and 21.1 #66544

clusterversion,kv: remove old cluster versions corresponding to 20.2 and 21.1 #66544

ajwerner commented Jun 16, 2021 •

edited by irfansharif

Loading

blathers-crl bot commented Aug 19, 2021

tbg commented Sep 7, 2021

clusterversion,kv: remove old cluster versions corresponding to 20.2 and 21.1 #66544

clusterversion,kv: remove old cluster versions corresponding to 20.2 and 21.1 #66544

Comments

ajwerner commented Jun 16, 2021 • edited by irfansharif Loading

blathers-crl bot commented Aug 19, 2021

tbg commented Sep 7, 2021

ajwerner commented Jun 16, 2021 •

edited by irfansharif

Loading