Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv/kvserver: TestInitRaftGroupOnRequest failed #48322

Closed
cockroach-teamcity opened this issue May 2, 2020 · 6 comments
Closed

kv/kvserver: TestInitRaftGroupOnRequest failed #48322

cockroach-teamcity opened this issue May 2, 2020 · 6 comments
Labels
A-kv-replication Relating to Raft, consensus, and coordination. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

(kv/kvserver).TestInitRaftGroupOnRequest failed on release-20.1@48092caa06b8d599887dc4a62790cc6c412f8fcd:

=== RUN   TestInitRaftGroupOnRequest
I200502 15:19:04.836185 198455 gossip/gossip.go:395  [n1] NodeDescriptor set to node_id:1 address:<network_field:"tcp" address_field:"127.0.0.1:32805" > attrs:<> locality:<> ServerVersion:<major_val:0 minor_val:0 patch:0 unstable:0 > build_tag:"" started_at:0 cluster_name:"" sql_address:<network_field:"" address_field:"" > 
W200502 15:19:04.987637 198455 gossip/gossip.go:1513  [n2] no incoming or outgoing connections
I200502 15:19:04.989249 198455 gossip/gossip.go:395  [n2] NodeDescriptor set to node_id:2 address:<network_field:"tcp" address_field:"127.0.0.1:36391" > attrs:<> locality:<> ServerVersion:<major_val:0 minor_val:0 patch:0 unstable:0 > build_tag:"" started_at:0 cluster_name:"" sql_address:<network_field:"" address_field:"" > 
I200502 15:19:05.001857 198447 gossip/client.go:124  [n2] started gossip client to 127.0.0.1:32805
I200502 15:19:05.120494 198455 kv/kvserver/replica_command.go:397  [s1,r1/1:/M{in-ax}] initiating a split of this range at key /Table/50 [r2] (manual)
I200502 15:19:05.238315 198455 kv/kvserver/replica_command.go:1602  [s1,r2/1:/{Table/50-Max}] change replicas (add [(n2,s2):2LEARNER] remove []): existing descriptor r2:/{Table/50-Max} [(n1,s1):1, next=2, gen=1]
I200502 15:19:05.257084 198455 kv/kvserver/replica_raft.go:247  [s1,r2/1:/{Table/50-Max},txn=fb86bde1] proposing SIMPLE(l2) ADD_REPLICA[(n2,s2):2LEARNER]: after=[(n1,s1):1 (n2,s2):2LEARNER] next=3
W200502 15:19:05.263317 198455 kv/kvserver/replica_write.go:200  [s1,r2/1:/{Table/50-Max},txn=fb86bde1] intents not processed as async resolution is disabled
I200502 15:19:05.283144 198455 kv/kvserver/store_snapshot.go:932  [s1,r2/1:/{Table/50-Max}] sending LEARNER snapshot b8570091 at applied index 15
I200502 15:19:05.284491 198455 kv/kvserver/store_snapshot.go:975  [s1,r2/1:/{Table/50-Max}] streamed snapshot to (n2,s2):2LEARNER: kv pairs: 6, log entries: 0, rate-limit: 8.0 MiB/sec, 0.02s
I200502 15:19:05.310881 198771 kv/kvserver/raft_snapshot_queue.go:126  [raftsnapshot,s1,r2/1:/{Table/50-Max}] skipping snapshot; replica is likely a learner in the process of being added: (n2,s2):2LEARNER
I200502 15:19:05.314017 198856 kv/kvserver/replica_raftstorage.go:780  [s2,r2/2:{-}] applying LEARNER snapshot [id=b8570091 index=15]
W200502 15:19:05.320281 198755 kv/kvserver/replica_read.go:102  [liveness-hb,txn=a698227f,range-lookup=/System/NodeLiveness/1,s1,r1/1:/{Min-Table/50}] intents not processed as async resolution is disabled
I200502 15:19:05.331921 198856 kv/kvserver/replica_raftstorage.go:801  [s2,r2/2:/{Table/50-Max}] applied LEARNER snapshot [total=15ms ingestion=4@6ms id=b8570091 index=15]
I200502 15:19:05.356217 198455 kv/kvserver/replica_command.go:1602  [s1,r2/1:/{Table/50-Max}] change replicas (add [(n2,s2):2] remove []): existing descriptor r2:/{Table/50-Max} [(n1,s1):1, (n2,s2):2LEARNER, next=3, gen=2]
W200502 15:19:05.476777 198903 kv/kvserver/replica_read.go:102  [liveness-hb,txn=4e4feb9c,range-lookup=/System/NodeLiveness/2,s1,r1/1:/{Min-Table/50}] intents not processed as async resolution is disabled
I200502 15:19:05.477965 198455 kv/kvserver/replica_raft.go:247  [s1,r2/1:/{Table/50-Max},txn=6113e401] proposing SIMPLE(v2) ADD_REPLICA[(n2,s2):2]: after=[(n1,s1):1 (n2,s2):2] next=3
W200502 15:19:05.517609 198455 kv/kvserver/replica_write.go:200  [s1,r2/1:/{Table/50-Max},txn=6113e401] intents not processed as async resolution is disabled
I200502 15:19:05.519065 198455 util/stop/stopper.go:539  quiescing
I200502 15:19:05.588072 198455 util/stop/stopper.go:539  quiescing
I200502 15:19:05.599162 198741 kv/kvserver/node_liveness.go:804  [liveness-hb] retrying liveness update after kvserver.errRetryLiveness: result is ambiguous (server shutdown)
W200502 15:19:05.599670 198741 kv/kvserver/node_liveness.go:471  [liveness-hb] failed node liveness heartbeat: context canceled
I200502 15:19:05.600488 198936 kv/txn.go:735  async rollback failed: node unavailable; try another peer
W200502 15:19:05.784136 199057 kv/kvserver/replica_read.go:102  [liveness-hb,txn=e89d78fa,range-lookup=/System/NodeLiveness/1,s1,r1/1:/{Min-Table/50}] intents not processed as async resolution is disabled
W200502 15:19:05.815470 199160 kv/kvserver/replica_read.go:102  [liveness-hb,txn=8064be5a,range-lookup=/System/NodeLiveness/2,s1,r1/1:/{Min-Table/50}] intents not processed as async resolution is disabled
I200502 15:19:06.015898 199229 util/stop/stopper.go:539  quiescing
I200502 15:19:06.016399 199228 util/stop/stopper.go:539  quiescing
I200502 15:19:06.070536 199227 util/stop/stopper.go:539  quiescing
W200502 15:19:06.074103 198859 kv/kvserver/raft_transport.go:637  while processing outgoing Raft queue to node 1: rpc error: code = Canceled desc = grpc: the client connection is closing:
W200502 15:19:06.075960 198769 kv/kvserver/raft_transport.go:637  while processing outgoing Raft queue to node 2: EOF:
W200502 15:19:06.076132 198857 kv/kvserver/raft_transport.go:637  while processing outgoing Raft queue to node 1: EOF:
W200502 15:19:06.076960 198772 kv/kvserver/raft_transport.go:637  while processing outgoing Raft queue to node 2: rpc error: code = Canceled desc = grpc: the client connection is closing:
I200502 15:19:06.088494 199227 util/stop/stopper.go:539  quiescing
I200502 15:19:06.092879 199227 util/stop/stopper.go:539  quiescing
--- FAIL: TestInitRaftGroupOnRequest (1.41s)
    client_raft_test.go:3976: expected NotLeaseHolderError; got <nil>

More

Parameters:

  • GOFLAGS=-json
make stressrace TESTS=TestInitRaftGroupOnRequest PKG=./pkg/kv/kvserver TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity cockroach-teamcity added branch-release-20.1 C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels May 2, 2020
@cockroach-teamcity cockroach-teamcity added this to the 20.1 milestone May 2, 2020
@cockroach-teamcity
Copy link
Member Author

(kv/kvserver).TestInitRaftGroupOnRequest failed on release-20.1@32b4c18258448f90d424243b0df75764e90cb650:

=== RUN   TestInitRaftGroupOnRequest
I200521 18:06:30.932812 197597 gossip/gossip.go:395  [n1] NodeDescriptor set to node_id:1 address:<network_field:"tcp" address_field:"127.0.0.1:33887" > attrs:<> locality:<> ServerVersion:<major_val:0 minor_val:0 patch:0 unstable:0 > build_tag:"" started_at:0 cluster_name:"" sql_address:<network_field:"" address_field:"" > 
W200521 18:06:31.111702 197597 gossip/gossip.go:1513  [n2] no incoming or outgoing connections
I200521 18:06:31.112860 197597 gossip/gossip.go:395  [n2] NodeDescriptor set to node_id:2 address:<network_field:"tcp" address_field:"127.0.0.1:41675" > attrs:<> locality:<> ServerVersion:<major_val:0 minor_val:0 patch:0 unstable:0 > build_tag:"" started_at:0 cluster_name:"" sql_address:<network_field:"" address_field:"" > 
I200521 18:06:31.122741 197620 gossip/client.go:124  [n2] started gossip client to 127.0.0.1:33887
I200521 18:06:31.246203 197597 kv/kvserver/replica_command.go:397  [s1,r1/1:/M{in-ax}] initiating a split of this range at key /Table/50 [r2] (manual)
I200521 18:06:31.368365 197597 kv/kvserver/replica_command.go:1602  [s1,r2/1:/{Table/50-Max}] change replicas (add [(n2,s2):2LEARNER] remove []): existing descriptor r2:/{Table/50-Max} [(n1,s1):1, next=2, gen=1]
I200521 18:06:31.463075 197597 kv/kvserver/replica_raft.go:247  [s1,r2/1:/{Table/50-Max},txn=8cb148f2] proposing SIMPLE(l2) ADD_REPLICA[(n2,s2):2LEARNER]: after=[(n1,s1):1 (n2,s2):2LEARNER] next=3
W200521 18:06:31.483353 197597 kv/kvserver/replica_write.go:200  [s1,r2/1:/{Table/50-Max},txn=8cb148f2] intents not processed as async resolution is disabled
I200521 18:06:31.513227 197597 kv/kvserver/store_snapshot.go:932  [s1,r2/1:/{Table/50-Max}] sending LEARNER snapshot 77174c82 at applied index 15
I200521 18:06:31.515353 197597 kv/kvserver/store_snapshot.go:975  [s1,r2/1:/{Table/50-Max}] streamed snapshot to (n2,s2):2LEARNER: kv pairs: 6, log entries: 0, rate-limit: 8.0 MiB/sec, 0.02s
I200521 18:06:31.534959 197939 kv/kvserver/replica_raftstorage.go:780  [s2,r2/2:{-}] applying LEARNER snapshot [id=77174c82 index=15]
I200521 18:06:31.538938 197892 kv/kvserver/raft_snapshot_queue.go:126  [raftsnapshot,s1,r2/1:/{Table/50-Max}] skipping snapshot; replica is likely a learner in the process of being added: (n2,s2):2LEARNER
I200521 18:06:31.554483 197939 kv/kvserver/replica_raftstorage.go:801  [s2,r2/2:/{Table/50-Max}] applied LEARNER snapshot [total=16ms ingestion=4@13ms id=77174c82 index=15]
W200521 18:06:31.605420 197780 kv/kvserver/replica_read.go:102  [liveness-hb,txn=fc52a59a,range-lookup=/System/NodeLiveness/2,s1,r1/1:/{Min-Table/50}] intents not processed as async resolution is disabled
I200521 18:06:31.623364 197597 kv/kvserver/replica_command.go:1602  [s1,r2/1:/{Table/50-Max}] change replicas (add [(n2,s2):2] remove []): existing descriptor r2:/{Table/50-Max} [(n1,s1):1, (n2,s2):2LEARNER, next=3, gen=2]
I200521 18:06:31.743379 197597 kv/kvserver/replica_raft.go:247  [s1,r2/1:/{Table/50-Max},txn=9e107b93] proposing SIMPLE(v2) ADD_REPLICA[(n2,s2):2]: after=[(n1,s1):1 (n2,s2):2] next=3
W200521 18:06:31.785029 197597 kv/kvserver/replica_write.go:200  [s1,r2/1:/{Table/50-Max},txn=9e107b93] intents not processed as async resolution is disabled
I200521 18:06:31.801050 197597 util/stop/stopper.go:539  quiescing
I200521 18:06:31.826694 197597 util/stop/stopper.go:539  quiescing
W200521 18:06:32.006695 198188 kv/kvserver/replica_read.go:102  [liveness-hb,txn=615d93dc,range-lookup=/System/NodeLiveness/1,s1,r1/1:/{Min-Table/50}] intents not processed as async resolution is disabled
W200521 18:06:32.091578 198289 kv/kvserver/replica_read.go:102  [liveness-hb,txn=8e7c22e6,range-lookup=/System/NodeLiveness/2,s1,r1/1:/{Min-Table/50}] intents not processed as async resolution is disabled
I200521 18:06:32.151952 198328 util/stop/stopper.go:539  quiescing
I200521 18:06:32.170794 198329 util/stop/stopper.go:539  quiescing
I200521 18:06:32.249109 198327 util/stop/stopper.go:539  quiescing
W200521 18:06:32.259147 197975 kv/kvserver/raft_transport.go:637  while processing outgoing Raft queue to node 1: rpc error: code = Unavailable desc = transport is closing:
W200521 18:06:32.260326 197887 kv/kvserver/raft_transport.go:637  while processing outgoing Raft queue to node 1: rpc error: code = Canceled desc = grpc: the client connection is closing:
W200521 18:06:32.260426 197936 kv/kvserver/raft_transport.go:637  while processing outgoing Raft queue to node 2: EOF:
W200521 18:06:32.265866 197954 kv/kvserver/raft_transport.go:637  while processing outgoing Raft queue to node 2: EOF:
I200521 18:06:32.270343 198327 util/stop/stopper.go:539  quiescing
I200521 18:06:32.272793 198327 util/stop/stopper.go:539  quiescing
--- FAIL: TestInitRaftGroupOnRequest (1.50s)
    client_raft_test.go:3976: expected NotLeaseHolderError; got <nil>

More

Parameters:

  • GOFLAGS=-json
make stressrace TESTS=TestInitRaftGroupOnRequest PKG=./pkg/kv/kvserver TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

See this test on roachdash
powered by pkg/cmd/internal/issues

@knz knz added the A-kv-replication Relating to Raft, consensus, and coordination. label Jun 4, 2020
@knz
Copy link
Contributor

knz commented Jun 4, 2020

The error looks legitimate.

cc @tbg @irfansharif for triage

@irfansharif
Copy link
Contributor

The build artifacts have gone missing. This test has seen a lot of recent deflaking across #47625 and #47937. Running the stress manually now to see if anything shakes out.

@irfansharif
Copy link
Contributor

irfansharif commented Jun 4, 2020

Not super reproducible unfortunately.

135378 runs so far, 0 failures, over 26m35s
...
194629 runs so far, 0 failures, over 38m10s

@knz
Copy link
Contributor

knz commented Jun 8, 2020

This is a unit test failure, not roachtest, so there's no distinct artifact.
You'll find the detailed logs in the main build log directly.

@tbg
Copy link
Member

tbg commented Jun 16, 2020

This is a unit test failure, not roachtest, so there's no distinct artifact.

They also get removed after some time. Closing since it hasn't reoccurred.

@tbg tbg closed this as completed Jun 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-replication Relating to Raft, consensus, and coordination. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Projects
None yet
Development

No branches or pull requests

5 participants