-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raft: ensure leadMaxSupported cannot regress when a leader bumps its term #133764
Labels
A-leader-leases
Related to the introduction of leader leases
branch-master
Failures and bugs on the master branch.
branch-release-24.3
Used to mark GA and release blockers, technical advisories, and bugs for 24.3
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
GA-blocker
T-kv
KV Team
Comments
arulajmani
added
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
T-kv
KV Team
A-leader-leases
Related to the introduction of leader leases
labels
Oct 30, 2024
arulajmani
added
branch-master
Failures and bugs on the master branch.
branch-release-24.3
Used to mark GA and release blockers, technical advisories, and bugs for 24.3
GA-blocker
labels
Oct 30, 2024
arulajmani
added a commit
to arulajmani/cockroach
that referenced
this issue
Oct 30, 2024
Previously, it was possible for a leader to regress leaderMaxSupported by calling an election at a higher term. This was because we weren't careful to recognize this case on the leader, and followers had special case handling that allowed a leader to override `inFortifyLease`. Together, this could cause lease regressions for LeaderLeases in some rare cases. This patch fixes this issue by removing special case handling in `inFortifyLease`. We also remove a special case in the handling of `MsgHup` which allowed a leader to step down and campaign without de-fortifying itself. Fixes cockroachdb#133764 Release note: None
arulajmani
added a commit
to arulajmani/cockroach
that referenced
this issue
Oct 30, 2024
Previously, it was possible for a leader to regress leaderMaxSupported by calling an election at a higher term. This was because we weren't careful to recognize this case on the leader, and followers had special case handling that allowed a leader to override `inFortifyLease`. Together, this could cause lease regressions for LeaderLeases in some rare cases. This patch fixes this issue by removing special case handling in `inFortifyLease`. We also remove a special case in the handling of `MsgHup` which allowed a leader to step down and campaign without de-fortifying itself. Fixes cockroachdb#133764 Release note: None
arulajmani
added a commit
to arulajmani/cockroach
that referenced
this issue
Oct 31, 2024
Previously, it was possible for a leader to regress leaderMaxSupported by calling an election at a higher term. This was because we weren't careful to recognize this case on the leader, and followers had special case handling that allowed a leader to override `inFortifyLease`. Together, this could cause lease regressions for LeaderLeases in some rare cases. This patch fixes this issue by removing special case handling in `inFortifyLease`. We also remove a special case in the handling of `MsgHup` which allowed a leader to step down and campaign without de-fortifying itself. Fixes cockroachdb#133764 Release note: None
craig bot
pushed a commit
that referenced
this issue
Nov 4, 2024
132877: kv: deflake TestLeaseQueueLeasePreferencePurgatoryError r=nvanbenschoten a=nvanbenschoten Fixes #132652. This commit fixes a flake in TestLeaseQueueLeasePreferencePurgatoryError where a delayed span config split could trip up the test and cause it to flake. We now wait for the span config split to be processed before waiting on the lease queue's purgatory, which should eliminate the flake. Release note: None 133568: changefeedccl: add PTS to system.users r=andyyang890 a=asg0451 Add `system.users` to the list of system tables that changefeeds protect with PTS. This table is required for CDC Queries. Part of: #128806 Release note (enterprise change): Add `system.users` to the list of system tables that changefeeds protect with PTS. This table is required for CDC Queries. 133873: raft: ensure leaderMaxSupported does not regress when bumping terms r=nvanbenschoten a=arulajmani Previously, it was possible for a leader to regress leaderMaxSupported by calling an election at a higher term. This was because we weren't careful to recognize this case on the leader, and followers had special case handling that allowed a leader to override `inFortifyLease`. Together, this could cause lease regressions for LeaderLeases in some rare cases. This patch fixes this issue by removing special case handling in `inFortifyLease`. We also remove a special case in the handling of `MsgHup` which allowed a leader to step down and campaign without de-fortifying itself. Fixes #133764 Release note: None 133895: ui: reroute v2 db page to legacy page when cluster is unfinalized r=xinhaoz a=xinhaoz The migrations that support the v2 db page runs on finalization. Rather than show a non-functional page in an unfinalized cluster state let's reroute to the legacy page. Epic: CRDB-37558 Fixes: #133843 Release note (ui change): The v2 db pages will only be available post ugprade finalization to 24.3. Prior to that we'll continue to show the legacy page when the cluster is in its unfinalized state. 134082: cli: improve tsdump upload concurrency behavior r=arjunmahishi a=dhartunian - wrap concurrent error collection and status printing in necessary mutexes. - reduce goroutine spawning to 10 for primary loop which avoids rate limit errors from datadog. Release note: None 134216: sql/catalog: avoid multiple interface boxing allocs from `descpb.NameInfo` r=nvanbenschoten a=nvanbenschoten When we cast a `descpb.NameInfo` to a `catalog.NameKey`, we incur at least one heap allocation to box the struct in an interface. This commit ensures that it's only one, and not multiple. The change was motivated by observing two heap allocations in `lookupStoreCacheID`. Ideally, we would have no heap allocations here, but this is a step in the right direction. While here, we also clean up a few similar cases. ``` name old time/op new time/op delta Sysbench/SQL/oltp_read_only 36.3ms ± 4% 36.0ms ± 3% ~ (p=0.730 n=9+9) Sysbench/SQL/oltp_write_only 19.3ms ± 6% 18.8ms ± 4% -2.36% (p=0.043 n=9+10) Sysbench/SQL/oltp_read_write 58.7ms ± 3% 57.2ms ± 2% -2.49% (p=0.000 n=10+10) Sysbench/SQL/oltp_point_select 1.61ms ± 1% 1.56ms ± 2% -2.59% (p=0.000 n=9+10) Sysbench/SQL/oltp_begin_commit 594µs ± 4% 608µs ± 7% ~ (p=0.089 n=10+10) name old alloc/op new alloc/op delta Sysbench/SQL/oltp_read_only 1.05MB ± 1% 1.04MB ± 0% -0.64% (p=0.004 n=9+10) Sysbench/SQL/oltp_write_only 405kB ± 1% 404kB ± 1% ~ (p=0.720 n=9+10) Sysbench/SQL/oltp_read_write 1.50MB ± 1% 1.51MB ± 1% ~ (p=0.447 n=9+10) Sysbench/SQL/oltp_point_select 35.3kB ± 2% 35.1kB ± 2% ~ (p=0.278 n=9+10) Sysbench/SQL/oltp_begin_commit 18.9kB ± 0% 18.9kB ± 1% ~ (p=0.055 n=10+10) name old allocs/op new allocs/op delta Sysbench/SQL/oltp_read_only 6.96k ± 1% 6.86k ± 1% -1.37% (p=0.000 n=9+10) Sysbench/SQL/oltp_write_only 3.62k ± 1% 3.58k ± 0% -0.96% (p=0.000 n=9+10) Sysbench/SQL/oltp_read_write 11.0k ± 1% 10.9k ± 1% -0.56% (p=0.030 n=9+10) Sysbench/SQL/oltp_point_select 317 ± 1% 312 ± 0% -1.74% (p=0.000 n=9+10) Sysbench/SQL/oltp_begin_commit 140 ± 0% 140 ± 0% ~ (p=1.000 n=10+10) ``` Epic: None Release note: None 134217: sql: fix recently introduced data race r=yuzefovich a=yuzefovich Recently, in a5b8d06 we introduced a possible data race. Namely, that change modifies a proto inside of the physical plan to cap the slice of types during the vectorized operator planning in order to prevent a type schema corruption due to slice aliasing. However, it's been assumed in a couple of places that the physical plan is immutable. This commit clarifies that the physical plan is immutable after its finalization and moves the capping into the finalization. Fixes: #133934. Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Miles Frankel <[email protected]> Co-authored-by: Arul Ajmani <[email protected]> Co-authored-by: Xin Hao Zhang <[email protected]> Co-authored-by: David Hartunian <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]>
craig bot
pushed a commit
that referenced
this issue
Nov 5, 2024
133132: sql/tests: create sysbench microbenchmark suite r=nvanbenschoten a=nvanbenschoten This commit adds a new microbenchmark suite which emulates `sysbench`, with subtests for `oltp_read_only`, `oltp_write_only`, and `oltp_read_write`. The structure of the benchmark suite makes it easy to add more subtests in the future (e.g. for `oltp_multi_insert`). The suite is designed to run the same workload against different levels of the CockroachDB stack, with the initial implementation targeting SQL (and below) through the `sysbenchSQL` driver and KV (and below) through the `sysbenchKV` driver. An example of additional an driver that could be added in the future is `sysbenchPebble`. It would also make sense to test drivers under different configurations. For example, we should add a variant of the `sysbenchKV` driver which disables the local RPC fast-path. The goal of this suite is to provide developers with a way to quickly measure the performance of CockroachDB against sysbench in a Go microbenchmark environment. This will help with identifying opportunities for performance improvements and with providing a rapid feedback loop for evaluating the effectiveness of performance changes. The initial benchmark performance looks like: ``` name time/op Sysbench/SQL/OltpReadOnly 2.88ms ±16% Sysbench/SQL/OltpWriteOnly 2.32ms ±16% Sysbench/SQL/OltpReadWrite 5.47ms ± 6% Sysbench/KV/OltpReadOnly 445µs ± 5% Sysbench/KV/OltpWriteOnly 594µs ± 5% Sysbench/KV/OltpReadWrite 1.07ms ± 4% name alloc/op Sysbench/SQL/OltpReadOnly 965kB ± 1% Sysbench/SQL/OltpWriteOnly 487kB ± 4% Sysbench/SQL/OltpReadWrite 1.34MB ± 0% Sysbench/KV/OltpReadOnly 264kB ± 1% Sysbench/KV/OltpWriteOnly 184kB ± 4% Sysbench/KV/OltpReadWrite 440kB ± 0% name allocs/op Sysbench/SQL/OltpReadOnly 6.26k ± 1% Sysbench/SQL/OltpWriteOnly 3.40k ± 0% Sysbench/SQL/OltpReadWrite 9.65k ± 0% Sysbench/KV/OltpReadOnly 650 ± 0% Sysbench/KV/OltpWriteOnly 1.08k ± 1% Sysbench/KV/OltpReadWrite 1.73k ± 0% ``` Epic: None Release Note: None 133643: storage: bump Pebble to the updated Category API r=RaduBerinde a=RaduBerinde Changes: * [`71bb6ba2`](cockroachdb/pebble@71bb6ba2) sstable: make Category an enum Release note: none. Fixes #133507 133873: raft: ensure leaderMaxSupported does not regress when bumping terms r=nvanbenschoten a=arulajmani Previously, it was possible for a leader to regress leaderMaxSupported by calling an election at a higher term. This was because we weren't careful to recognize this case on the leader, and followers had special case handling that allowed a leader to override `inFortifyLease`. Together, this could cause lease regressions for LeaderLeases in some rare cases. This patch fixes this issue by removing special case handling in `inFortifyLease`. We also remove a special case in the handling of `MsgHup` which allowed a leader to step down and campaign without de-fortifying itself. Fixes #133764 Release note: None 134127: kvcoord: add metric for async requests throttling duration r=yuzefovich a=yuzefovich This commit adds another metric `distsender.batches.async.throttled_cumulative_duration_nanos` which exposes how long partial batches were throttled for. This can be useful when adjusting the DistSender concurrency limit. Informs: #131125. Epic: None Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Radu Berinde <[email protected]> Co-authored-by: Arul Ajmani <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]>
blathers-crl bot
pushed a commit
that referenced
this issue
Nov 5, 2024
Previously, it was possible for a leader to regress leaderMaxSupported by calling an election at a higher term. This was because we weren't careful to recognize this case on the leader, and followers had special case handling that allowed a leader to override `inFortifyLease`. Together, this could cause lease regressions for LeaderLeases in some rare cases. This patch fixes this issue by removing special case handling in `inFortifyLease`. We also remove a special case in the handling of `MsgHup` which allowed a leader to step down and campaign without de-fortifying itself. Fixes #133764 Release note: None
nvanbenschoten
pushed a commit
to nvanbenschoten/cockroach
that referenced
this issue
Nov 5, 2024
Previously, it was possible for a leader to regress leaderMaxSupported by calling an election at a higher term. This was because we weren't careful to recognize this case on the leader, and followers had special case handling that allowed a leader to override `inFortifyLease`. Together, this could cause lease regressions for LeaderLeases in some rare cases. This patch fixes this issue by removing special case handling in `inFortifyLease`. We also remove a special case in the handling of `MsgHup` which allowed a leader to step down and campaign without de-fortifying itself. Fixes cockroachdb#133764 Release note: None
arulajmani
added a commit
that referenced
this issue
Nov 6, 2024
Previously, it was possible for a leader to regress leaderMaxSupported by calling an election at a higher term. This was because we weren't careful to recognize this case on the leader, and followers had special case handling that allowed a leader to override `inFortifyLease`. Together, this could cause lease regressions for LeaderLeases in some rare cases. This patch fixes this issue by removing special case handling in `inFortifyLease`. We also remove a special case in the handling of `MsgHup` which allowed a leader to step down and campaign without de-fortifying itself. Fixes #133764 Release note: None
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-leader-leases
Related to the introduction of leader leases
branch-master
Failures and bugs on the master branch.
branch-release-24.3
Used to mark GA and release blockers, technical advisories, and bugs for 24.3
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
GA-blocker
T-kv
KV Team
Describe the problem
While staring at some raft code with @nvanbenschoten earlier today, as one does, we re-discovered this TODO:
cockroach/pkg/raft/raft.go
Lines 1353 to 1356 in 77b210b
As written, it's possible for a raft leader to regress
leadMaxSupported
after it steps down, if it calls an election, doesn't win it, but de-fortifies followers and allowing another follower to win an election. This is possible because the TODO isn't addressed and this logic ininFortifyLease
:cockroach/pkg/raft/raft.go
Lines 1480 to 1486 in 77b210b
To fix this, we could either:
leadMaxSupported
is in the past by callingCanDefortify
inhup
.inFortifyLease
and checksupportingFortifedLeader
without exception inhup
.We settled on approach 2, as we no longer need this implicit de-fortification now that we have the explicit
MsgDeFortifyLeader
flow.Jira issue: CRDB-43740
Epic CRDB-37522
The text was updated successfully, but these errors were encountered: