roachtest: failover/chaos/read-write failed #104709

cockroach-teamcity · 2023-06-11T13:22:50Z

roachtest.failover/chaos/read-write failed with artifacts on master @ 6e38f676273d08b3a111213d6c9f7945629228d7:

test artifacts and logs in: /artifacts/failover/chaos/read-write/run_1
(assertions.go:333).Fail: 
	Error Trace:	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/failover.go:1379
	            				github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/failover.go:288
	            				main/pkg/cmd/roachtest/monitor.go:105
	            				golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:75
	            				GOROOT/src/runtime/asm_amd64.s:1594
	Error:      	Received unexpected error:
	            	pq: replica unavailable: (n4,s4):11 unable to serve request to r792:/Table/106/1/-3{22495525764153008-04067210006201408} [(n4,s4):11, (n8,s8):10, (n3,s3):9, (n7,s7):6, (n6,s6):7, next=12, gen=76, sticky=9223372036.854775807,2147483647]: closed timestamp: 1686489678.462266747,0 (2023-06-11 13:21:18); raft status: {"id":"b","term":14,"vote":"b","commit":1205,"lead":"b","raftState":"StateLeader","applied":1205,"progress":{"6":{"match":1205,"next":1206,"state":"StateReplicate"},"7":{"match":1205,"next":1206,"state":"StateReplicate"},"9":{"match":1205,"next":1206,"state":"StateReplicate"},"a":{"match":1205,"next":1206,"state":"StateReplicate"},"b":{"match":1205,"next":1206,"state":"StateReplicate"}},"leadtransferee":"0"}: encountered poisoned latch /Local/Range/Table/106/1/-322495525764153008/RangeDescriptor@0,0
	Test:       	failover/chaos/read-write
(require.go:1360).NoError: FailNow called
(monitor.go:137).Wait: monitor failure: monitor task failed: t.Fatal() was called
(cluster.go:2276).Run: cluster.RunE: context canceled
(cluster.go:2276).Run: cluster.RunE: context canceled
(cluster.go:2276).Run: cluster.RunE: context canceled
(cluster.go:2276).Run: cluster.RunE: context canceled

Parameters: ROACHTEST_arch=amd64 , ROACHTEST_cloud=gce , ROACHTEST_cpu=2 , ROACHTEST_encrypted=false , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=false , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

/cc @cockroachdb/kv-triage _{This test on roachdash | Improve this report!

Jira issue: CRDB-28685
Epic CRDB-27234}

The text was updated successfully, but these errors were encountered:

erikgrinaker · 2023-06-12T08:16:46Z

Something seems a bit messed up here. We failed on a deadlockFailer.Ready() while executing a SHOW CLUSTER RANGES WITH DETAILS following a concurrent disk stall and deadlock failure, due to a replica circuit breaker on r792. However, r792 wasn't involved in the deadlock failure, and we shouldn't have lost quorum on it here either. The range had a leader, and all followers appeared caught up.

I'll dig into this later.

13:19:27 failover.go:312: failing n8 (disk-stall)
13:19:28 failover.go:312: failing n6 (deadlock)
13:19:28 failover.go:1424: locked r364 on n6
13:19:28 failover.go:1424: locked r1086 on n6
13:20:28 failover.go:320: recovering n6 (deadlock)
13:20:28 failover.go:1455: unlocked r364 on n6
13:20:28 failover.go:1455: unlocked r1086 on n6
13:20:28 failover.go:320: recovering n8 (disk-stall)
13:20:29 cluster.go:859: test status: stopping nodes :8
teamcity-10484309-1686462469-66-n10cpu2: stopping
13:20:30 cluster.go:859: test status: starting nodes :8
13:20:30 cockroach.go:184: teamcity-10484309-1686462469-66-n10cpu2: starting nodes
13:21:36 test_impl.go:349: test failure #1: full stack retained in failure_1.log: (assertions.go:333).Fail:
        Error Trace:    github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/failover.go:1379
                                                github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/failover.go:288
                                                main/pkg/cmd/roachtest/monitor.go:105
                                                golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:75
                                                GOROOT/src/runtime/asm_amd64.s:1594
        Error:          Received unexpected error:
                        pq: replica unavailable: (n4,s4):11 unable to serve request to r792:/Table/106/1/-3{22495525764153008-04067210006201408} [(n4,s4):11, (n8,s8):10, (n3,s3):9, (n7,s7):6, (n6,s6):7, next=12, gen=76, sticky=9223372036.854775807,2147483647]: closed timestamp: 1686489678.462266747,0 (2023-06-11 13:21:18); raft status: {"id":"b","term":14,"vote":"b","commit":1205,"lead":"b","raftState":"StateLeader","applied":1205,"progress":{"6":{"match":1205,"next":1206,"state":"StateReplicate"},"7":{"match":1205,"next":1206,"state":"StateReplicate"},"9":{"match":1205,"next":1206,"state":"StateReplicate"},"a":{"match":1205,"next":1206,"state":"StateReplicate"},"b":{"match":1205,"next":1206,"state":"StateReplicate"}},"leadtransferee":"0"}: encountered poisoned latch /Local/Range/Table/106/1/-322495525764153008/RangeDescriptor@0,0

blathers-crl · 2023-06-12T08:17:53Z

cc @cockroachdb/replication

cockroach-teamcity · 2023-06-19T12:58:51Z

roachtest.failover/chaos/read-write failed with artifacts on master @ 9633594f139d80d1555127a09f1ecd12dd1d9b11:

(assertions.go:333).Fail: 
	Error Trace:	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/failover.go:1379
	            				github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/failover.go:288
	            				main/pkg/cmd/roachtest/monitor.go:105
	            				golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:75
	            				GOROOT/src/runtime/asm_amd64.s:1594
	Error:      	Received unexpected error:
	            	pq: replica unavailable: (n7,s7):6 unable to serve request to r766:/Table/106/1/28{65603100361473792-84031416119425392} [(n9,s9):7, (n5,s5):2, (n4,s4):13, (n6,s6):12, (n7,s7):6, next=14, gen=88, sticky=9223372036.854775807,2147483647]: closed timestamp: 1687179324.822085781,0 (2023-06-19 12:55:24); raft status: {"id":"6","term":15,"vote":"6","commit":1887,"lead":"6","raftState":"StateLeader","applied":1887,"progress":{"6":{"match":1887,"next":1888,"state":"StateReplicate"},"7":{"match":1887,"next":1888,"state":"StateReplicate"},"c":{"match":1887,"next":1888,"state":"StateReplicate"},"d":{"match":1887,"next":1888,"state":"StateReplicate"},"2":{"match":1887,"next":1888,"state":"StateReplicate"}},"leadtransferee":"0"}: encountered poisoned latch /Local/Range/Table/106/1/2865603100361473792/RangeDescriptor@0,0
	Test:       	failover/chaos/read-write
(require.go:1360).NoError: FailNow called
(monitor.go:137).Wait: monitor failure: monitor task failed: t.Fatal() was called
(cluster.go:2279).Run: cluster.RunE: context canceled
(cluster.go:2279).Run: cluster.RunE: context canceled
(cluster.go:2279).Run: cluster.RunE: context canceled
(cluster.go:2279).Run: cluster.RunE: context canceled
test artifacts and logs in: /artifacts/failover/chaos/read-write/run_1

Parameters: ROACHTEST_arch=amd64 , ROACHTEST_cloud=gce , ROACHTEST_cpu=2 , ROACHTEST_encrypted=false , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

_{This test on roachdash | Improve this report!}

erikgrinaker · 2023-06-20T15:37:46Z

There's a lot of other stuff going on here in the latest failure. The async rangelog updates we added in #102813 appear to be failing reliably during a scatter, with hundreds of these:

W230619 12:37:02.939045 45952 kv/kvserver/range_log.go:254 ⋮ [n7,s7,r1064/2:‹/Table/106/1/20{17900…-36328…}›,*kvpb.AdminScatterRequest] 1685  error logging to system.rangelog: operation ‹"rangelog-timeout"› timed out after 5s (given timeout 5s): aborted during DistSender.Send: context deadline exceeded

We're also seeing tons of campaign failures, which may either be related to the above, or fallout from #104969:

W230619 12:37:03.185264 156 go.etcd.io/raft/v3/raft.go:917 ⋮ [T1,n7,s7,r364/3:‹/Table/106/1/-32{8945…-7102…}›] 1760  3 cannot campaign at term 9 since there are still pending configuration changes to apply

And theres tons of txn heartbeat failures due to using a canceled context:

I230619 12:36:59.982466 46378 server/node.go:1317 ⋮ [T1,n7] 1352  batch request ‹HeartbeatTxn [/Local/Range/Table/106/1/-8578380985326469808/RangeDescriptor,/Min), [txn: c9232c77]› failed with error: ‹result is ambiguous: after 0.02s of attempting command: context canceled›
I230619 12:36:59.982466 46378 server/node.go:1317 ⋮ [T1,n7] 1352 +trace:
I230619 12:36:59.982466 46378 server/node.go:1317 ⋮ [T1,n7] 1352 +‹<empty recording>›

And a conf change that's been stalled for over 10 minutes:

W230619 12:53:41.960507 184 kv/kvserver/replica_raft.go:1528 ⋮ [T1,n7,s7,r766/6:‹/Table/106/1/28{65603…-84031…}›,raft] 24131  have been waiting 974.50s for slow proposal EndTxn(commit change-replicas) [/Local/Range/Table/106/1/‹2865603100361473792›/‹RangeDescriptor›], [txn: d335bbe7]

I'm going to try to disentangle this mess.

erikgrinaker · 2023-06-20T16:32:21Z

I'm seeing both the campaign failures and HeartbeatTxn failures prior to both of those PRs, so it's not new. This would also go some way towards explaining why the rangelog writes fail, if we're temporarily losing Raft leadership. Bumping the timeout to 10 seconds would likely help, I'll give that a try.

However, I'm not seeing the "pending configuration changes" error on 23.1, but I am seeing a few similar "9 is unpromotable and can not campaign" errors. I'm not seeing any of the HeartbeatTxn errors on 23.1.

cockroach-teamcity · 2023-06-21T13:32:12Z

roachtest.failover/chaos/read-write failed with artifacts on master @ 1eb628e8c8a7eb1cbf9bfa2bd6c31982c25cbba0:

(assertions.go:333).Fail: 
	Error Trace:	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/failover.go:1379
	            				github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/failover.go:288
	            				main/pkg/cmd/roachtest/monitor.go:105
	            				golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:75
	            				GOROOT/src/runtime/asm_amd64.s:1594
	Error:      	Received unexpected error:
	            	pq: error getting span statistics - number of spans in request payload (1059) exceeds 'server.span_stats.span_batch_limit' cluster setting limit (500)
	Test:       	failover/chaos/read-write
(require.go:1360).NoError: FailNow called
(monitor.go:137).Wait: monitor failure: monitor task failed: t.Fatal() was called
(cluster.go:2279).Run: cluster.RunE: context canceled
(cluster.go:2279).Run: cluster.RunE: context canceled
(cluster.go:2279).Run: cluster.RunE: context canceled
(cluster.go:2279).Run: cluster.RunE: context canceled
test artifacts and logs in: /artifacts/failover/chaos/read-write/run_1

Parameters: ROACHTEST_arch=amd64 , ROACHTEST_cloud=gce , ROACHTEST_cpu=2 , ROACHTEST_encrypted=false , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=false , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

_{This test on roachdash | Improve this report!}

erikgrinaker · 2023-06-21T17:31:21Z

Last failure is #105274.

erikgrinaker · 2023-06-22T19:09:14Z

Split out #105378 for the HeartbeatTxn trace logging, it's fallout from #102793.

cockroach-teamcity · 2023-06-23T12:21:20Z

roachtest.failover/chaos/read-write failed with artifacts on master @ cb6dd27fdfbfc8a9d0d4753809f4588bae8060d0:

(assertions.go:333).Fail: 
	Error Trace:	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/failover.go:1408
	            				github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/failover.go:299
	            				main/pkg/cmd/roachtest/monitor.go:105
	            				golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:75
	            				GOROOT/src/runtime/asm_amd64.s:1594
	Error:      	Received unexpected error:
	            	pq: error getting span statistics - number of spans in request payload (1064) exceeds 'server.span_stats.span_batch_limit' cluster setting limit (500)
	Test:       	failover/chaos/read-write
(require.go:1360).NoError: FailNow called
(monitor.go:137).Wait: monitor failure: monitor task failed: t.Fatal() was called
(cluster.go:2279).Run: cluster.RunE: context canceled
(cluster.go:2279).Run: cluster.RunE: context canceled
(cluster.go:2279).Run: cluster.RunE: context canceled
(cluster.go:2279).Run: cluster.RunE: context canceled
test artifacts and logs in: /artifacts/failover/chaos/read-write/run_1

Parameters: ROACHTEST_arch=amd64 , ROACHTEST_cloud=gce , ROACHTEST_cpu=2 , ROACHTEST_encrypted=false , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=false , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

_{This test on roachdash | Improve this report!}

erikgrinaker · 2023-06-23T13:26:51Z

Opened #105413 for the rangelog timeouts.

cockroach-teamcity · 2023-06-24T12:08:26Z

roachtest.failover/chaos/read-write failed with artifacts on master @ 7fd4c21157221eae9e7d5892d89d2b5a671aba3e:

(assertions.go:333).Fail: 
	Error Trace:	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/failover.go:1408
	            				github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/failover.go:299
	            				main/pkg/cmd/roachtest/monitor.go:105
	            				golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:75
	            				GOROOT/src/runtime/asm_amd64.s:1594
	Error:      	Received unexpected error:
	            	pq: error getting span statistics - number of spans in request payload (1064) exceeds 'server.span_stats.span_batch_limit' cluster setting limit (500)
	Test:       	failover/chaos/read-write
(require.go:1360).NoError: FailNow called
(monitor.go:137).Wait: monitor failure: monitor task failed: t.Fatal() was called
(cluster.go:2279).Run: cluster.RunE: context canceled
(cluster.go:2279).Run: cluster.RunE: context canceled
(cluster.go:2279).Run: cluster.RunE: context canceled
(cluster.go:2279).Run: cluster.RunE: context canceled
test artifacts and logs in: /artifacts/failover/chaos/read-write/run_1

Parameters: ROACHTEST_arch=amd64 , ROACHTEST_cloud=gce , ROACHTEST_cpu=2 , ROACHTEST_encrypted=false , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=false , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

_{This test on roachdash | Improve this report!}

cockroach-teamcity · 2023-06-27T13:09:35Z

roachtest.failover/chaos/read-write failed with artifacts on master @ 0b0a212ecbe514b55301f4a963edc92b16de9bab:

(assertions.go:333).Fail: 
	Error Trace:	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/failover.go:1429
	            				github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/failover.go:302
	            				main/pkg/cmd/roachtest/monitor.go:105
	            				golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:75
	            				GOROOT/src/runtime/asm_amd64.s:1594
	Error:      	Received unexpected error:
	            	pq: error requesting iterating nodes for span stats from node 8 (NODE_STATUS_LIVE): rpc error: code = Unknown desc = replica unavailable: (n6,s6):7 unable to serve request to r1039:/Table/106/1/65{32837936193842192-51266251951793792} [(n3,s3):13, (n6,s6):7, (n8,s8):11, (n4,s4):14, (n9,s9):5, next=15, gen=92, sticky=9223372036.854775807,2147483647]: closed timestamp: 1687871114.188750889,0 (2023-06-27 13:05:14); raft status: {"id":"7","term":13,"vote":"7","commit":959,"lead":"7","raftState":"StateLeader","applied":959,"progress":{"5":{"match":959,"next":960,"state":"StateReplicate"},"7":{"match":959,"next":960,"state":"StateReplicate"},"b":{"match":959,"next":960,"state":"StateReplicate"},"d":{"match":959,"next":960,"state":"StateReplicate"},"e":{"match":959,"next":960,"state":"StateReplicate"}},"leadtransferee":"0"}: encountered poisoned latch /Local/Range/Table/106/1/6532837936193842192/RangeDescriptor@0,0
	Test:       	failover/chaos/read-write
(require.go:1360).NoError: FailNow called
(monitor.go:137).Wait: monitor failure: monitor task failed: t.Fatal() was called
(cluster.go:2279).Run: cluster.RunE: context canceled
(cluster.go:2279).Run: cluster.RunE: context canceled
(cluster.go:2279).Run: cluster.RunE: context canceled
(cluster.go:2279).Run: cluster.RunE: context canceled
test artifacts and logs in: /artifacts/failover/chaos/read-write/run_1

Parameters: ROACHTEST_arch=amd64 , ROACHTEST_cloud=gce , ROACHTEST_cpu=2 , ROACHTEST_encrypted=false , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

_{This test on roachdash | Improve this report!}

erikgrinaker · 2023-06-29T06:11:52Z

The last failure is in part caused by circuit breakers getting stuck tripping on a reproposal that never goes through, even though the Raft group is fully functional: #105797.

We also see it seemingly poisoning a meta2 latch outside of the replica's keyspan: #105798.

Because etcd/raft activates configuration changes when they are applied, but checks new proposed configs before they are considered for adding to the log (or forwarding to the leader), the following can happen: - conf change 1 gets evaluated on a leaseholder n1 - lease changes - new leaseholder evaluates and commits conf change 2 - n1 receives and applies conf change 2 - conf change 1 gets added to the proposal buffer and flushed; RawNode rejects it because conf change 1 is not compatible on top of conf change 2 Prior to this commit, because raft silently replaces the conf change with an empty entry, we would never see the proposal below raft (where it would be rejected due to the lease change). In effect, this caused replica unavailability because the proposal and the associated latch would stick around forever, and the replica circuit breaker would trip. This commit provides a targeted fix: when the proposal buffer flushes a conf change to raft, we check if it got replaced with an empty entry. If so, we properly finish the proposal. To be conservative, we signal it with an ambiguous result: it seems conceivable that the rejection would only occur on a reproposal, while the original proposal made it into raft before the lease change, and the local replica is in fact behind on conf changes rather than ahead (which can happen if it's a follower). The only "customer" here is the replicate queue (and scatter, etc) so this is acceptable; any choice here would necessarily be a "hard error" anyway. Epic: CRDB-25287 Release note (bug fix): under rare circumstances, a replication change could get stuck when proposed near lease/leadership changes (and likely under overload), and the replica circuit breakers could trip. This problem has been addressed. Fixes cockroachdb#105797. Closes cockroachdb#104709.

102248: kvserver: tighten and de-fatal replicaGC checks r=erikgrinaker a=tbg Now we return most failed checks as errors and report errors to sentry (but without crashing). Closes #94813. Epic: None Release note: None 106104: kvserver: fail stale ConfChange when rejected by raft r=erikgrinaker a=tbg Because etcd/raft activates configuration changes when they are applied, but checks new proposed configs before they are considered for adding to the log (or forwarding to the leader), the following can happen: - conf change 1 gets evaluated on a leaseholder n1 - lease changes - new leaseholder evaluates and commits conf change 2 - n1 receives and applies conf change 2 - conf change 1 gets added to the proposal buffer and flushed; RawNode rejects it because conf change 1 is not compatible on top of conf change 2 Prior to this commit, because raft silently replaces the conf change with an empty entry, we would never see the proposal below raft (where it would be rejected due to the lease change). In effect, this caused replica unavailability because the proposal and the associated latch would stick around forever, and the replica circuit breaker would trip. This commit provides a targeted fix: when the proposal buffer flushes a conf change to raft, we check if it got replaced with an empty entry. If so, we properly finish the proposal. To be conservative, we signal it with an ambiguous result: it seems conceivable that the rejection would only occur on a reproposal, while the original proposal made it into raft before the lease change, and the local replica is in fact behind on conf changes rather than ahead (which can happen if it's a follower). The only "customer" here is the replicate queue (and scatter, etc) so this is acceptable; any choice here would necessarily be a "hard error" anyway. Epic: CRDB-25287 Release note (bug fix): under rare circumstances, a replication change could get stuck when proposed near lease/leadership changes (and likely under overload), and the replica circuit breakers could trip. This problem has been addressed. Fixes #105797. Closes #104709. Co-authored-by: Tobias Grieger <[email protected]>

Because etcd/raft activates configuration changes when they are applied, but checks new proposed configs before they are considered for adding to the log (or forwarding to the leader), the following can happen: - conf change 1 gets evaluated on a leaseholder n1 - lease changes - new leaseholder evaluates and commits conf change 2 - n1 receives and applies conf change 2 - conf change 1 gets added to the proposal buffer and flushed; RawNode rejects it because conf change 1 is not compatible on top of conf change 2 Prior to this commit, because raft silently replaces the conf change with an empty entry, we would never see the proposal below raft (where it would be rejected due to the lease change). In effect, this caused replica unavailability because the proposal and the associated latch would stick around forever, and the replica circuit breaker would trip. This commit provides a targeted fix: when the proposal buffer flushes a conf change to raft, we check if it got replaced with an empty entry. If so, we properly finish the proposal. To be conservative, we signal it with an ambiguous result: it seems conceivable that the rejection would only occur on a reproposal, while the original proposal made it into raft before the lease change, and the local replica is in fact behind on conf changes rather than ahead (which can happen if it's a follower). The only "customer" here is the replicate queue (and scatter, etc) so this is acceptable; any choice here would necessarily be a "hard error" anyway. Epic: CRDB-25287 Release note (bug fix): under rare circumstances, a replication change could get stuck when proposed near lease/leadership changes (and likely under overload), and the replica circuit breakers could trip. This problem has been addressed. Fixes cockroachdb#105797. Closes cockroachdb#104709.

cockroach-teamcity added this to the 23.1 milestone Jun 11, 2023

erikgrinaker added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. and removed release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Jun 12, 2023

erikgrinaker self-assigned this Jun 12, 2023

exalate-issue-sync bot assigned erikgrinaker and unassigned erikgrinaker Jun 12, 2023

exalate-issue-sync bot added T-kv-replication and removed T-kv KV Team labels Jun 12, 2023

erikgrinaker mentioned this issue Jun 12, 2023

roachtest: failover/system-non-liveness/blackhole-send/lease=expiration failed #104694

Closed

erikgrinaker mentioned this issue Jun 22, 2023

server: spammy trace logging on context cancellation #105378

Closed

erikgrinaker mentioned this issue Jun 23, 2023

kvserver: rangelog timeouts during scatters #105413

Closed

erikgrinaker mentioned this issue Jun 29, 2023

kvserver: config change proposals can hang forever #105797

Closed

erikgrinaker mentioned this issue Jun 29, 2023

kvserver: make poisoned latch errors less confusing #105798

Closed

erikgrinaker assigned tbg Jun 29, 2023

erikgrinaker added the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Jun 29, 2023

miraradeva mentioned this issue Jun 30, 2023

kvserver: add rangelog writes to a queue #105903

Closed

cockroach-teamcity mentioned this issue Jul 2, 2023

roachtest: failover/chaos/read-write failed #106015

Closed

tbg mentioned this issue Jul 4, 2023

kvserver: fail stale ConfChange when rejected by raft #106104

Merged

craig bot closed this as completed in 93117db Jul 5, 2023

exalate-issue-sync bot unassigned tbg Jul 5, 2023

tbg mentioned this issue Jul 5, 2023

release-23.1: kvserver: fail stale ConfChange when rejected by raft #106147

Closed

tbg mentioned this issue Jul 5, 2023

release-22.2: kvserver: fail stale ConfChange when rejected by raft #106150

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

roachtest: failover/chaos/read-write failed #104709

roachtest: failover/chaos/read-write failed #104709

cockroach-teamcity commented Jun 11, 2023 •

edited by exalate-issue-sync bot

Loading

erikgrinaker commented Jun 12, 2023

blathers-crl bot commented Jun 12, 2023

cockroach-teamcity commented Jun 19, 2023

erikgrinaker commented Jun 20, 2023 •

edited

Loading

erikgrinaker commented Jun 20, 2023 •

edited

Loading

cockroach-teamcity commented Jun 21, 2023

erikgrinaker commented Jun 21, 2023

erikgrinaker commented Jun 22, 2023

cockroach-teamcity commented Jun 23, 2023

erikgrinaker commented Jun 23, 2023

cockroach-teamcity commented Jun 24, 2023

cockroach-teamcity commented Jun 27, 2023

erikgrinaker commented Jun 29, 2023 •

edited

Loading

roachtest: failover/chaos/read-write failed #104709

roachtest: failover/chaos/read-write failed #104709

Comments

cockroach-teamcity commented Jun 11, 2023 • edited by exalate-issue-sync bot Loading

erikgrinaker commented Jun 12, 2023

blathers-crl bot commented Jun 12, 2023

cockroach-teamcity commented Jun 19, 2023

erikgrinaker commented Jun 20, 2023 • edited Loading

erikgrinaker commented Jun 20, 2023 • edited Loading

cockroach-teamcity commented Jun 21, 2023

erikgrinaker commented Jun 21, 2023

erikgrinaker commented Jun 22, 2023

cockroach-teamcity commented Jun 23, 2023

erikgrinaker commented Jun 23, 2023

cockroach-teamcity commented Jun 24, 2023

cockroach-teamcity commented Jun 27, 2023

erikgrinaker commented Jun 29, 2023 • edited Loading

cockroach-teamcity commented Jun 11, 2023 •

edited by exalate-issue-sync bot

Loading

erikgrinaker commented Jun 20, 2023 •

edited

Loading

erikgrinaker commented Jun 20, 2023 •

edited

Loading

erikgrinaker commented Jun 29, 2023 •

edited

Loading