Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

teamcity: failed test: TestSnapshotAfterTruncationWithUncommittedTail #37085

Closed
cockroach-teamcity opened this issue Apr 24, 2019 · 0 comments · Fixed by #37105
Closed

teamcity: failed test: TestSnapshotAfterTruncationWithUncommittedTail #37085

cockroach-teamcity opened this issue Apr 24, 2019 · 0 comments · Fixed by #37105
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

The following tests appear to have failed on master (test): TestSnapshotAfterTruncationWithUncommittedTail

You may want to check for open issues.

#1259277:

TestSnapshotAfterTruncationWithUncommittedTail
...16:49:08.439731 269204 storage/store.go:1530  [s1,r1/1:/M{in-ax}] could not gossip first range descriptor: [NotLeaseHolderError] r1: replica (n1,s1):1 not lease holder; lease holder unknown
I190424 16:49:08.439798 417133 internal/client/txn.go:619  async rollback failed: failed to send RPC: sending to all 3 replicas failed; last error: (err: node unavailable; try another peer) <nil>
I190424 16:49:08.439834 408751 storage/client_test.go:1332  [txn=db1990a1] test clock advanced to: 516.600000697,0
I190424 16:49:08.439854 417810 util/stop/stopper.go:546  quiescing; tasks left:
4      mtc send
I190424 16:49:08.439946 408751 internal/client/txn.go:619  async rollback failed: failed to send RPC: sending to all 3 replicas failed; last error: (err: node unavailable; try another peer) <nil>
I190424 16:49:08.440050 409996 storage/client_test.go:1332  [txn=a9160843] test clock advanced to: 518.400000699,0
I190424 16:49:08.440112 416866 internal/client/txn.go:619  async rollback failed: failed to send RPC: sending to all 3 replicas failed; last error: (err: node unavailable; try another peer) <nil>
I190424 16:49:08.440146 417810 util/stop/stopper.go:546  quiescing; tasks left:
2      mtc send
I190424 16:49:08.440184 409996 internal/client/txn.go:619  async rollback failed: failed to send RPC: sending to all 3 replicas failed; last error: (err: node unavailable; try another peer) <nil>
I190424 16:49:08.440270 412690 internal/client/txn.go:619  async rollback failed: failed to send RPC: sending to all 3 replicas failed; last error: (err: node unavailable; try another peer) <nil>
I190424 16:49:08.440302 417810 util/stop/stopper.go:546  quiescing; tasks left:
1      mtc send
I190424 16:49:08.440327 269350 storage/client_test.go:1332  [hb,txn=14730dc2] test clock advanced to: 520.200000701,0
W190424 16:49:08.440505 269350 internal/client/txn.go:510  [hb] failure aborting transaction: node unavailable; try another peer; abort caused by: failed to send RPC: sending to all 3 replicas failed; last error: (err: node unavailable; try another peer) <nil>
W190424 16:49:08.440546 269350 storage/node_liveness.go:463  [hb] failed node liveness heartbeat: failed to send RPC: sending to all 3 replicas failed; last error: (err: node unavailable; try another peer) <nil>
I190424 16:49:08.442240 269227 storage/node_liveness.go:775  [hb] retrying liveness update after storage.errRetryLiveness: result is ambiguous (context done during DistSender.Send: context canceled)
W190424 16:49:08.442279 269227 storage/node_liveness.go:463  [hb] failed node liveness heartbeat: context canceled
I190424 16:49:08.442511 417925 internal/client/txn.go:619  async rollback failed: node unavailable; try another peer
W190424 16:49:08.443286 269620 gossip/gossip.go:1496  [n3] no incoming or outgoing connections
W190424 16:49:08.443437 270326 storage/raft_transport.go:583  while processing outgoing Raft queue to node 3: rpc error: code = Unavailable desc = transport is closing:
W190424 16:49:08.443480 269349 gossip/gossip.go:1496  [n2] no incoming or outgoing connections
    client_raft_test.go:1013: condition failed to evaluate within 45s: expected at least 1 snapshot to catch the partitioned replica up
        goroutine 269065 [running]:
        runtime/debug.Stack(0xa7a358200, 0xc001fac060, 0x3520be0)
        	/usr/local/go/src/runtime/debug/stack.go:24 +0xa7
        github.com/cockroachdb/cockroach/pkg/testutils.SucceedsSoon(0x35865e0, 0xc00124fc00, 0xc0037c33e0)
        	/go/src/github.com/cockroachdb/cockroach/pkg/testutils/soon.go:49 +0x103
        github.com/cockroachdb/cockroach/pkg/storage_test.TestSnapshotAfterTruncationWithUncommittedTail(0xc00124fc00)
        	/go/src/github.com/cockroachdb/cockroach/pkg/storage/client_raft_test.go:1013 +0xed4
        testing.tRunner(0xc00124fc00, 0x2f34e78)
        	/usr/local/go/src/testing/testing.go:827 +0xbf
        created by testing.(*T).Run
        	/usr/local/go/src/testing/testing.go:878 +0x35c




Please assign, take a look and update the issue accordingly.

@cockroach-teamcity cockroach-teamcity added this to the 19.2 milestone Apr 24, 2019
@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Apr 24, 2019
@nvanbenschoten nvanbenschoten self-assigned this Apr 24, 2019
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Apr 24, 2019
Fixes cockroachdb#37085.

This filtering was assuming that the Raft message's index field
indicated the first index in the MsgApp. It actually indicates the
log index that _precedes_ any of the entries in the MsgApp.

It stole this from TestReplicaRangefeedRetryErrors, so the commit
updates that too.

Release note: None
craig bot pushed a commit that referenced this issue Apr 25, 2019
37105: storage: deflake TestSnapshotAfterTruncationWithUncommittedTail r=nvanbenschoten a=nvanbenschoten

Fixes #37085.

This PR fixes two flakes in TestSnapshotAfterTruncationWithUncommittedTail. The first has to do with unexpected NotLeaseHolderErrors and the second has to do with a stall that was possible due to incorrect filtering of `MsgApp`s.

Co-authored-by: Nathan VanBenschoten <[email protected]>
@craig craig bot closed this as completed in #37105 Apr 25, 2019
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Apr 25, 2019
Fixes cockroachdb#37085.

This filtering was assuming that the Raft message's index field
indicated the first index in the MsgApp. It actually indicates the
log index that _precedes_ any of the entries in the MsgApp.

It stole this from TestReplicaRangefeedRetryErrors, so the commit
updates that too.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants