Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ccl/backupccl: TestBackupRestorePartitioned failed under stress #39653

Closed
cockroach-teamcity opened this issue Aug 14, 2019 · 14 comments · Fixed by #40348
Closed

ccl/backupccl: TestBackupRestorePartitioned failed under stress #39653

cockroach-teamcity opened this issue Aug 14, 2019 · 14 comments · Fixed by #40348
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

SHA: https://github.com/cockroachdb/cockroach/commits/bad5e2ea8674f0513f949d3940609b1f4d685581

Parameters:

TAGS=
GOFLAGS=-race -parallel=2

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=TestBackupRestorePartitioned PKG=github.com/cockroachdb/cockroach/pkg/ccl/backupccl TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1437981&tab=buildLog

W190814 06:01:28.165688 5152 gossip/gossip.go:1517  [n3] no incoming or outgoing connections
W190814 06:01:28.166370 5703 storage/raft_transport.go:620  [n3] while processing outgoing Raft queue to node 1: EOF:
W190814 06:01:28.166611 5438 storage/raft_transport.go:620  [n2] while processing outgoing Raft queue to node 1: EOF:
W190814 06:01:28.166849 5469 storage/raft_transport.go:620  [n1] while processing outgoing Raft queue to node 2: rpc error: code = Canceled desc = grpc: the client connection is closing:
I190814 06:01:28.166981 8072 storage/replica_command.go:1068  [n3,replicate,s3,r28/3:/Table/53/1/{200-300}] failed to rollback learner n2,s2, abandoning it for the replicate queue: change replicas of r28 failed: fetching current range descriptor value: node unavailable; try another peer
W190814 06:01:28.167364 5854 storage/raft_transport.go:620  [n3] while processing outgoing Raft queue to node 1: EOF:
I190814 06:01:28.167492 8072 storage/queue.go:518  [n3,replicate,s3,r28/3:/Table/53/1/{200-300}] rate limited in MaybeAdd (replicate): node unavailable; try another peer
E190814 06:01:28.168963 8072 storage/queue.go:1027  [n3,replicate,s3,r28/3:/Table/53/1/{200-300}] change replicas of r28 failed: log-range-event: node unavailable; try another peer
I190814 06:01:28.169365 8094 util/stop/stopper.go:542  quiescing; tasks left:
2      [async] closedts-subscription
1      [async] storage.IntentResolver: cleanup txn records
1      [async] closedts-rangefeed-subscriber
I190814 06:01:28.164686 5234 rpc/nodedialer/nodedialer.go:149  [n2] unable to connect to n1: failed to connect to n1 at 127.0.0.1:46765: node unavailable; try another peer
I190814 06:01:28.170230 8140 rpc/nodedialer/nodedialer.go:149  [ct-client] unable to connect to n3: failed to connect to n3 at 127.0.0.1:35569: context canceled
W190814 06:01:28.171274 7958 storage/raft_transport.go:620  [n3] while processing outgoing Raft queue to node 2: EOF:
I190814 06:01:28.171783 8139 rpc/nodedialer/nodedialer.go:149  [ct-client] unable to connect to n2: failed to connect to n2 at 127.0.0.1:35355: context canceled
W190814 06:01:28.173049 7810 storage/raft_transport.go:620  [n3] while processing outgoing Raft queue to node 2: EOF:
I190814 06:01:28.175147 8094 util/stop/stopper.go:542  quiescing; tasks left:
2      [async] closedts-subscription
1      [async] closedts-rangefeed-subscriber
I190814 06:01:28.175148 8093 util/stop/stopper.go:542  quiescing; tasks left:
2      [async] closedts-subscription
1      [async] closedts-rangefeed-subscriber
W190814 06:01:28.179696 7939 storage/raft_transport.go:620  [n2] while processing outgoing Raft queue to node 3: rpc error: code = Canceled desc = grpc: the client connection is closing:
W190814 06:01:28.179842 7806 storage/raft_transport.go:620  [n2] while processing outgoing Raft queue to node 3: rpc error: code = Canceled desc = grpc: the client connection is closing:
I190814 06:01:28.180974 8092 util/stop/stopper.go:542  quiescing; tasks left:
2      [async] closedts-subscription
1      [async] closedts-rangefeed-subscriber
I190814 06:01:28.182171 8093 util/stop/stopper.go:542  quiescing; tasks left:
2      [async] closedts-subscription
I190814 06:01:28.182556 8093 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] closedts-subscription
I190814 06:01:28.186712 8092 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] closedts-subscription
I190814 06:01:28.196774 5546 rpc/nodedialer/nodedialer.go:149  [n3] unable to connect to n1: failed to connect to n1 at 127.0.0.1:46765: context canceled
E190814 06:01:28.198174 8095 vendor/google.golang.org/grpc/pickfirst.go:61  pickfirstBalancer: failed to NewSubConn: rpc error: code = Canceled desc = grpc: the client connection is closing
--- FAIL: TestBackupRestorePartitioned (7.40s)
    backup_test.go:246: error executing 'ALTER TABLE data.bank EXPERIMENTAL_RELOCATE VALUES (ARRAY[3], 200)': pq: change replicas of r28 failed: descriptor changed: [expected] r28:/Table/53/1/{200-300} [(n1,s1):1, (n3,s3):3, next=4, gen=9] != [actual] r28:/Table/53/1/{200-300} [(n1,s1):1, (n3,s3):3, (n2,s2):4LEARNER, next=5, gen=10]

@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Aug 14, 2019
@cockroach-teamcity cockroach-teamcity added this to the 19.2 milestone Aug 14, 2019
@thoszhang
Copy link
Contributor

EXPERIMENTAL_RELOCATE failed because the range descriptor changed under it, or something. I don't think this is a backup/restore failure; the test just needs to be de-flaked.

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/02f239d810711786174736328e3360dccf7ef412

Parameters:

TAGS=
GOFLAGS=-race -parallel=2

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=TestBackupRestorePartitioned PKG=github.com/cockroachdb/cockroach/pkg/ccl/backupccl TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1439792&tab=buildLog

W190815 05:45:38.069985 5087 gossip/gossip.go:1517  [n2] no incoming or outgoing connections
I190815 05:45:38.071511 8068 util/stop/stopper.go:542  quiescing; tasks left:
2      [async] closedts-subscription
1      [async] storage.IntentResolver: cleanup txn records
1      [async] closedts-rangefeed-subscriber
I190815 05:45:38.072303 8068 util/stop/stopper.go:542  quiescing; tasks left:
2      [async] closedts-subscription
1      [async] closedts-rangefeed-subscriber
W190815 05:45:38.072832 7651 storage/raft_transport.go:620  [n2] while processing outgoing Raft queue to node 3: rpc error: code = Canceled desc = grpc: the client connection is closing:
I190815 05:45:38.073891 8069 util/stop/stopper.go:542  quiescing; tasks left:
2      [async] closedts-subscription
W190815 05:45:38.074475 7899 storage/raft_transport.go:620  [n2] while processing outgoing Raft queue to node 3: rpc error: code = Canceled desc = grpc: the client connection is closing:
W190815 05:45:38.074888 5629 storage/raft_transport.go:620  [n2] while processing outgoing Raft queue to node 1: rpc error: code = Canceled desc = grpc: the client connection is closing:
I190815 05:45:38.077315 8070 util/stop/stopper.go:542  quiescing; tasks left:
2      [async] closedts-subscription
1      [async] intent_resolver_gc_batcher
1      [async] closedts-rangefeed-subscriber
I190815 05:45:38.079058 8070 util/stop/stopper.go:542  quiescing; tasks left:
2      [async] closedts-subscription
1      [async] closedts-rangefeed-subscriber
W190815 05:45:38.080925 7872 storage/raft_transport.go:620  [n3] while processing outgoing Raft queue to node 2: rpc error: code = Canceled desc = grpc: the client connection is closing:
W190815 05:45:38.082072 7583 storage/raft_transport.go:620  [n3] while processing outgoing Raft queue to node 2: rpc error: code = Canceled desc = grpc: the client connection is closing:
W190815 05:45:38.082218 5963 storage/raft_transport.go:620  [n3] while processing outgoing Raft queue to node 1: rpc error: code = Canceled desc = grpc: the client connection is closing:
W190815 05:45:38.082794 5790 storage/raft_transport.go:620  [n1] while processing outgoing Raft queue to node 2: rpc error: code = Canceled desc = grpc: the client connection is closing:
W190815 05:45:38.085682 5398 gossip/gossip.go:1517  [n3] no incoming or outgoing connections
I190815 05:45:38.086540 8068 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] closedts-subscription
I190815 05:45:38.087321 5220 rpc/nodedialer/nodedialer.go:149  [n2] unable to connect to n3: failed to connect to n3 at 127.0.0.1:38105: context canceled
W190815 05:45:38.089334 5813 storage/raft_transport.go:620  [n1] while processing outgoing Raft queue to node 3: rpc error: code = Canceled desc = grpc: the client connection is closing:
W190815 05:45:38.091796 5619 storage/raft_transport.go:620  [n3] while processing outgoing Raft queue to node 1: rpc error: code = Unavailable desc = transport is closing:
W190815 05:45:38.091924 5360 storage/raft_transport.go:620  [n1] while processing outgoing Raft queue to node 3: rpc error: code = Canceled desc = grpc: the client connection is closing:
I190815 05:45:38.092246 8070 util/stop/stopper.go:542  quiescing; tasks left:
2      [async] closedts-subscription
I190815 05:45:38.120920 7995 rpc/nodedialer/nodedialer.go:149  [ct-client] unable to connect to n2: failed to connect to n2 at 127.0.0.1:46187: context canceled
I190815 05:45:38.122371 7996 rpc/nodedialer/nodedialer.go:149  [ct-client] unable to connect to n1: failed to connect to n1 at 127.0.0.1:34347: context canceled
--- FAIL: TestBackupRestorePartitioned (6.09s)
    backup_test.go:245: error executing 'ALTER TABLE data.bank EXPERIMENTAL_RELOCATE VALUES (ARRAY[2], 100)': pq: change replicas of r27 failed: descriptor changed: [expected] r27:/Table/53/1/{100-200} [(n2,s2):2, (n3,s3):3, next=4, gen=10] != [actual] r27:/Table/53/1/{100-200} [(n2,s2):2, (n3,s3):3, (n1,s1):4LEARNER, next=5, gen=11]

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/720a6605f182c4459c56f055c23e805d48b89344

Parameters:

TAGS=
GOFLAGS=-race -parallel=2

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=TestBackupRestorePartitioned PKG=github.com/cockroachdb/cockroach/pkg/ccl/backupccl TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1441355&tab=buildLog

I190816 07:41:28.265624 8060 storage/replica_command.go:1156  [n2,replicate,s2,r25/2:/Table/53/1/{100-200}] failed to rollback learner n1,s1, abandoning it for the replicate queue: change replicas of r25 failed: fetching current range descriptor value: node unavailable; try another peer
I190816 07:41:28.265943 8060 storage/queue.go:518  [n2,replicate,s2,r25/2:/Table/53/1/{100-200}] rate limited in MaybeAdd (replicate): node unavailable; try another peer
E190816 07:41:28.266380 8060 storage/queue.go:1027  [n2,replicate,s2,r25/2:/Table/53/1/{100-200}] change replicas of r25 failed: log-range-event: failed to send RPC: sending to all 1 replicas failed; last error: (err: node unavailable; try another peer) <nil>
W190816 07:41:28.266871 5629 storage/raft_transport.go:620  [n1] while processing outgoing Raft queue to node 3: rpc error: code = Canceled desc = grpc: the client connection is closing:
I190816 07:41:28.267829 8137 util/stop/stopper.go:542  quiescing; tasks left:
2      [async] closedts-subscription
1      [async] intent_resolver_gc_batcher
1      [async] closedts-rangefeed-subscriber
W190816 07:41:28.267962 5794 storage/raft_transport.go:620  [n1] while processing outgoing Raft queue to node 3: rpc error: code = Unavailable desc = transport is closing:
W190816 07:41:28.270039 5763 storage/raft_transport.go:620  [n1] while processing outgoing Raft queue to node 2: rpc error: code = Unavailable desc = transport is closing:
I190816 07:41:28.271491 8136 util/stop/stopper.go:542  quiescing; tasks left:
2      [async] closedts-subscription
1      storage.IntentResolver: processing intents
1      [async] closedts-rangefeed-subscriber
W190816 07:41:28.258392 5644 storage/raft_transport.go:620  [n2] while processing outgoing Raft queue to node 1: EOF:
I190816 07:41:28.285192 5253 rpc/nodedialer/nodedialer.go:149  [n2] unable to connect to n1: failed to connect to n1 at 127.0.0.1:36975: node unavailable; try another peer
I190816 07:41:28.287490 8136 util/stop/stopper.go:542  quiescing; tasks left:
2      [async] closedts-subscription
1      [async] closedts-rangefeed-subscriber
W190816 07:41:28.287738 7730 storage/raft_transport.go:620  [n2] while processing outgoing Raft queue to node 3: rpc error: code = Canceled desc = grpc: the client connection is closing:
I190816 07:41:28.288362 8163 rpc/nodedialer/nodedialer.go:149  [ct-client] unable to connect to n3: failed to connect to n3 at 127.0.0.1:42533: context canceled
I190816 07:41:28.289244 8162 rpc/nodedialer/nodedialer.go:149  [ct-client] unable to connect to n2: failed to connect to n2 at 127.0.0.1:44591: context canceled
W190816 07:41:28.290003 7740 storage/raft_transport.go:620  [n3] while processing outgoing Raft queue to node 2: EOF:
W190816 07:41:28.290801 7618 storage/raft_transport.go:620  [n3] while processing outgoing Raft queue to node 2: EOF:
I190816 07:41:28.291830 8137 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] closedts-subscription
1      [async] closedts-rangefeed-subscriber
W190816 07:41:28.292892 7549 storage/raft_transport.go:620  [n2] while processing outgoing Raft queue to node 3: rpc error: code = Canceled desc = grpc: the client connection is closing:
I190816 07:41:28.296210 8136 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] closedts-rangefeed-subscriber
I190816 07:41:28.317581 8165 rpc/nodedialer/nodedialer.go:149  [ct-client] unable to connect to n2: failed to connect to n2 at 127.0.0.1:44591: context canceled
I190816 07:41:28.318058 8166 rpc/nodedialer/nodedialer.go:149  [ct-client] unable to connect to n1: failed to connect to n1 at 127.0.0.1:36975: context canceled
--- FAIL: TestBackupRestorePartitioned (6.79s)
    backup_test.go:245: error executing 'ALTER TABLE data.bank EXPERIMENTAL_RELOCATE VALUES (ARRAY[2], 100)': pq: change replicas of r25 failed: descriptor changed: [expected] r25:/Table/53/1/{100-200} [(n2,s2):2, (n3,s3):3, next=4, gen=10] != [actual] r25:/Table/53/1/{100-200} [(n2,s2):2, (n3,s3):3, (n1,s1):4LEARNER, next=5, gen=11]

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/ca8fa726de54a0feea9f33ad000e883a4168ef39

Parameters:

TAGS=
GOFLAGS=-race -parallel=2

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=TestBackupRestorePartitioned PKG=github.com/cockroachdb/cockroach/pkg/ccl/backupccl TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1442796&tab=buildLog

W190817 07:06:39.581319 5772 storage/raft_transport.go:620  [n1] while processing outgoing Raft queue to node 2: rpc error: code = Canceled desc = grpc: the client connection is closing:
I190817 07:06:39.581501 8022 storage/replica_command.go:948  [n2,replicate,s2,r28/2:/Table/53/1/{100-200}] could not promote [n1,s1] to voter, rolling back: snapshot failed: (n1,s1):4: remote failed to apply snapshot: rpc error: code = Internal desc = grpc: compressed flag set with identity or empty encoding
I190817 07:06:39.582516 8022 storage/replica_command.go:1156  [n2,replicate,s2,r28/2:/Table/53/1/{100-200}] failed to rollback learner n1,s1, abandoning it for the replicate queue: change replicas of r28 failed: fetching current range descriptor value: node unavailable; try another peer
I190817 07:06:39.582802 8022 storage/queue.go:518  [n2,replicate,s2,r28/2:/Table/53/1/{100-200}] rate limited in MaybeAdd (replicate): node unavailable; try another peer
I190817 07:06:39.583130 8022 storage/replicate_queue.go:266  [n2,replicate,s2,r28/2:/Table/53/1/{100-200}] snapshot failed: (n1,s1):4: remote failed to apply snapshot: rpc error: code = Internal desc = grpc: compressed flag set with identity or empty encoding
W190817 07:06:39.584576 7810 storage/raft_transport.go:620  [n2] while processing outgoing Raft queue to node 3: EOF:
W190817 07:06:39.584844 5634 storage/raft_transport.go:620  [n1] while processing outgoing Raft queue to node 3: rpc error: code = Canceled desc = grpc: the client connection is closing:
W190817 07:06:39.585557 5826 storage/raft_transport.go:620  [n2] while processing outgoing Raft queue to node 1: rpc error: code = Canceled desc = grpc: the client connection is closing:
W190817 07:06:39.585742 5716 storage/raft_transport.go:620  [n2] while processing outgoing Raft queue to node 1: EOF:
W190817 07:06:39.586046 7490 storage/raft_transport.go:620  [n2] while processing outgoing Raft queue to node 3: rpc error: code = Canceled desc = grpc: the client connection is closing:
W190817 07:06:39.586158 5090 gossip/gossip.go:1517  [n2] no incoming or outgoing connections
I190817 07:06:39.578622 8011 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] transport racer
I190817 07:06:39.586458 8012 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] storage.replicate: processing replica
1      [async] closedts-subscription
I190817 07:06:39.586990 8012 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] storage.replicate: processing replica
I190817 07:06:39.618300 8027 rpc/nodedialer/nodedialer.go:149  [ct-client] unable to connect to n1: failed to connect to n1 at 127.0.0.1:39719: context canceled
I190817 07:06:39.618916 8028 rpc/nodedialer/nodedialer.go:149  [ct-client] unable to connect to n3: failed to connect to n3 at 127.0.0.1:33339: context canceled
I190817 07:06:39.636433 8022 storage/replica_command.go:948  [n2,replicate,s2,r28/2:/Table/53/1/{100-200}] could not promote [] to voter, rolling back: change replicas of r28 failed: fetching current range descriptor value: node unavailable; try another peer
E190817 07:06:39.636751 8022 storage/queue.go:1027  [n2,replicate,s2,r28/2:/Table/53/1/{100-200}] change replicas of r28 failed: fetching current range descriptor value: node unavailable; try another peer
I190817 07:06:39.720898 4295 kv/transport_race.go:108  transport race promotion: ran 85 iterations on up to 1698 requests
I190817 07:06:39.727120 8015 rpc/nodedialer/nodedialer.go:149  [ct-client] unable to connect to n2: failed to connect to n2 at 127.0.0.1:41553: context canceled
I190817 07:06:39.727796 8014 rpc/nodedialer/nodedialer.go:149  [ct-client] unable to connect to n1: failed to connect to n1 at 127.0.0.1:39719: context canceled
--- FAIL: TestBackupRestorePartitioned (6.55s)
    backup_test.go:245: error executing 'ALTER TABLE data.bank EXPERIMENTAL_RELOCATE VALUES (ARRAY[2], 100)': pq: change replicas of r28 failed: descriptor changed: [expected] r28:/Table/53/1/{100-200} [(n3,s3):3, (n2,s2):2, next=4, gen=10] != [actual] r28:/Table/53/1/{100-200} [(n3,s3):3, (n2,s2):2, (n1,s1):4LEARNER, next=5, gen=11]

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/8ebcdac113118ae5fbcaddeecd269f59399aea8c

Parameters:

TAGS=
GOFLAGS=-race -parallel=2

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=TestBackupRestorePartitioned PKG=github.com/cockroachdb/cockroach/pkg/ccl/backupccl TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1443942&tab=buildLog

1      [async] storage.IntentResolver: cleanup txn records
1      [async] intent_resolver_ir_batcher
1      [async] intent_resolver_gc_batcher
1      [async] closedts-subscription
I190819 08:08:13.174303 8083 util/stop/stopper.go:542  quiescing; tasks left:
2      [async] closedts-subscription
I190819 08:08:13.174729 8116 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] storage.replicate: processing replica
1      [async] intent_resolver_gc_batcher
I190819 08:08:13.174740 8083 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] closedts-subscription
I190819 08:08:13.175052 8082 util/stop/stopper.go:542  quiescing; tasks left:
2      [async] closedts-subscription
1      [async] transport racer
W190819 08:08:13.175175 5376 gossip/gossip.go:1517  [n3] no incoming or outgoing connections
I190819 08:08:13.175480 8082 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] transport racer
1      [async] closedts-subscription
I190819 08:08:13.175118 8073 storage/queue.go:518  [n3,replicate,s3,r26/3:/Table/53/1/{200-300}] rate limited in MaybeAdd (replicate): node unavailable; try another peer
I190819 08:08:13.175228 8116 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] storage.replicate: processing replica
I190819 08:08:13.176079 8073 storage/replicate_queue.go:266  [n3,replicate,s3,r26/3:/Table/53/1/{200-300}] snapshot failed: (n1,s1):4: remote failed to apply snapshot: rpc error: code = Canceled desc = grpc: the client connection is closing
I190819 08:08:13.176130 8082 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] transport racer
E190819 08:08:13.200559 8104 vendor/google.golang.org/grpc/pickfirst.go:61  pickfirstBalancer: failed to NewSubConn: rpc error: code = Canceled desc = grpc: the client connection is closing
I190819 08:08:13.200706 4435 rpc/nodedialer/nodedialer.go:149  [n1] unable to connect to n2: failed to connect to n2 at 127.0.0.1:45937: context canceled
I190819 08:08:13.201551 4435 rpc/nodedialer/nodedialer.go:149  [n1] unable to connect to n3: failed to connect to n3 at 127.0.0.1:41245: context canceled
I190819 08:08:13.222283 8073 storage/replica_command.go:948  [n3,replicate,s3,r26/3:/Table/53/1/{200-300}] could not promote [] to voter, rolling back: change replicas of r26 failed: fetching current range descriptor value: node unavailable; try another peer
E190819 08:08:13.222631 8073 storage/queue.go:1027  [n3,replicate,s3,r26/3:/Table/53/1/{200-300}] change replicas of r26 failed: fetching current range descriptor value: node unavailable; try another peer
I190819 08:08:13.226216 4462 kv/transport_race.go:108  transport race promotion: ran 89 iterations on up to 1681 requests
E190819 08:08:13.242294 7952 vendor/google.golang.org/grpc/pickfirst.go:61  pickfirstBalancer: failed to NewSubConn: rpc error: code = Canceled desc = grpc: the client connection is closing
I190819 08:08:13.242415 8117 rpc/nodedialer/nodedialer.go:149  [ct-client] unable to connect to n1: failed to connect to n1 at 127.0.0.1:35253: context canceled
I190819 08:08:13.242688 8118 rpc/nodedialer/nodedialer.go:149  [ct-client] unable to connect to n2: failed to connect to n2 at 127.0.0.1:45937: context canceled
E190819 08:08:13.242496 8067 vendor/google.golang.org/grpc/pickfirst.go:61  pickfirstBalancer: failed to NewSubConn: rpc error: code = Canceled desc = grpc: the client connection is closing
I190819 08:08:13.246894 8087 rpc/nodedialer/nodedialer.go:149  [ct-client] unable to connect to n1: failed to connect to n1 at 127.0.0.1:35253: context canceled
I190819 08:08:13.247075 8088 rpc/nodedialer/nodedialer.go:149  [ct-client] unable to connect to n3: failed to connect to n3 at 127.0.0.1:41245: context canceled
--- FAIL: TestBackupRestorePartitioned (6.53s)
    backup_test.go:246: error executing 'ALTER TABLE data.bank EXPERIMENTAL_RELOCATE VALUES (ARRAY[3], 200)': pq: change replicas of r26 failed: descriptor changed: [expected] r26:/Table/53/1/{200-300} [(n3,s3):3, (n2,s2):2, next=4, gen=9] != [actual] r26:/Table/53/1/{200-300} [(n3,s3):3, (n2,s2):2, (n1,s1):4LEARNER, next=5, gen=10]

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/01ee0704865391599abef3bbc89f462117f8007a

Parameters:

TAGS=
GOFLAGS=-race -parallel=2

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=TestBackupRestorePartitioned PKG=github.com/cockroachdb/cockroach/pkg/ccl/backupccl TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1445556&tab=buildLog

I190820 05:50:48.241097 7962 util/stop/stopper.go:542  quiescing; tasks left:
2      [async] closedts-subscription
1      [async] storage.replicate: processing replica
1      [async] closedts-rangefeed-subscriber
W190820 05:50:48.241381 6227 storage/raft_transport.go:620  [n3] while processing outgoing Raft queue to node 1: EOF:
W190820 05:50:48.241414 7636 storage/raft_transport.go:620  [n3] while processing outgoing Raft queue to node 2: EOF:
W190820 05:50:48.242212 7785 storage/raft_transport.go:620  [n3] while processing outgoing Raft queue to node 2: EOF:
W190820 05:50:48.242747 5829 storage/raft_transport.go:620  [n1] while processing outgoing Raft queue to node 2: EOF:
I190820 05:50:48.240794 7961 util/stop/stopper.go:542  quiescing; tasks left:
2      node.Node: batch
2      [async] closedts-subscription
1      [async] storage.IntentResolver: cleanup txn records
1      [async] closedts-rangefeed-subscriber
I190820 05:50:48.247104 7961 util/stop/stopper.go:542  quiescing; tasks left:
2      node.Node: batch
2      [async] closedts-subscription
1      [async] closedts-rangefeed-subscriber
W190820 05:50:48.249789 5346 gossip/gossip.go:1517  [n3] no incoming or outgoing connections
W190820 05:50:48.250557 17 storage/engine/rocksdb.go:116  [rocksdb] [db/version_set.cc:3086] More existing levels in DB than needed. max_bytes_for_level_multiplier may not be guaranteed.
I190820 05:50:48.252033 7963 util/stop/stopper.go:542  quiescing; tasks left:
2      [async] closedts-subscription
1      [async] closedts-rangefeed-subscriber
W190820 05:50:48.253158 5698 storage/raft_transport.go:620  [n1] while processing outgoing Raft queue to node 3: EOF:
I190820 05:50:48.254443 7961 util/stop/stopper.go:542  quiescing; tasks left:
2      [async] closedts-subscription
1      node.Node: batch
1      [async] closedts-rangefeed-subscriber
W190820 05:50:48.254724 5739 storage/raft_transport.go:620  [n1] while processing outgoing Raft queue to node 3: rpc error: code = Unavailable desc = transport is closing:
I190820 05:50:48.258303 7961 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] closedts-rangefeed-subscriber
I190820 05:50:48.258419 7963 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] closedts-subscription
1      [async] closedts-rangefeed-subscriber
I190820 05:50:48.258818 7962 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] storage.replicate: processing replica
1      [async] closedts-subscription
I190820 05:50:48.258933 7963 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] closedts-rangefeed-subscriber
I190820 05:50:48.259201 7962 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] storage.replicate: processing replica
E190820 05:50:48.269183 7874 vendor/google.golang.org/grpc/pickfirst.go:61  pickfirstBalancer: failed to NewSubConn: rpc error: code = Canceled desc = grpc: the client connection is closing
I190820 05:50:48.269449 7974 rpc/nodedialer/nodedialer.go:149  [ct-client] unable to connect to n3: failed to connect to n3 at 127.0.0.1:44255: context canceled
I190820 05:50:48.269468 7973 rpc/nodedialer/nodedialer.go:149  [ct-client] unable to connect to n2: failed to connect to n2 at 127.0.0.1:36965: context canceled
I190820 05:50:48.278770 7946 storage/replica_command.go:948  [n2,replicate,s2,r28/2:/Table/53/1/{100-200}] could not promote [] to voter, rolling back: change replicas of r28 failed: fetching current range descriptor value: node unavailable; try another peer
E190820 05:50:48.279080 7946 storage/queue.go:1027  [n2,replicate,s2,r28/2:/Table/53/1/{100-200}] change replicas of r28 failed: fetching current range descriptor value: node unavailable; try another peer
--- FAIL: TestBackupRestorePartitioned (5.59s)
    backup_test.go:245: error executing 'ALTER TABLE data.bank EXPERIMENTAL_RELOCATE VALUES (ARRAY[2], 100)': pq: change replicas of r28 failed: descriptor changed: [expected] r28:/Table/53/1/{100-200} [(n3,s3):3, (n2,s2):2, next=4, gen=10] != [actual] r28:/Table/53/1/{100-200} [(n3,s3):3, (n2,s2):2, (n1,s1):4LEARNER, next=5, gen=11]

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/7ca0a86b8595c097fd8f27581b1509c47f17e8a3

Parameters:

TAGS=
GOFLAGS=-race -parallel=2

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=TestBackupRestorePartitioned PKG=github.com/cockroachdb/cockroach/pkg/ccl/backupccl TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1450702&tab=buildLog

I190823 06:13:39.242793 8110 util/stop/stopper.go:542  quiescing; tasks left:
2      [async] storage.IntentResolver: cleanup txn records
2      [async] closedts-subscription
1      [async] transport racer
1      [async] intent_resolver_gc_batcher
1      [async] closedts-rangefeed-subscriber
W190823 06:13:39.243134 7998 storage/intentresolver/intent_resolver.go:818  failed to gc transaction record: could not GC completed transaction anchored at /Local/Range/Table/53/RangeDescriptor: node unavailable; try another peer
W190823 06:13:39.244008 5705 storage/raft_transport.go:620  [n2] while processing outgoing Raft queue to node 1: rpc error: code = Canceled desc = grpc: the client connection is closing:
W190823 06:13:39.244382 5089 gossip/gossip.go:1517  [n2] no incoming or outgoing connections
W190823 06:13:39.244860 7700 storage/raft_transport.go:620  [n3] while processing outgoing Raft queue to node 2: rpc error: code = Canceled desc = grpc: the client connection is closing:
W190823 06:13:39.245728 5356 gossip/gossip.go:1517  [n3] no incoming or outgoing connections
W190823 06:13:39.247261 6016 storage/raft_transport.go:620  [n2] while processing outgoing Raft queue to node 1: rpc error: code = Canceled desc = grpc: the client connection is closing:
W190823 06:13:39.249107 7728 storage/raft_transport.go:620  [n2] while processing outgoing Raft queue to node 3: EOF:
W190823 06:13:39.252823 5710 storage/raft_transport.go:620  [n1] while processing outgoing Raft queue to node 2: rpc error: code = Canceled desc = grpc: the client connection is closing:
W190823 06:13:39.257043 5681 storage/raft_transport.go:620  [n1] while processing outgoing Raft queue to node 2: rpc error: code = Unavailable desc = transport is closing:
I190823 06:13:39.259228 8111 util/stop/stopper.go:542  quiescing; tasks left:
2      [async] storage.IntentResolver: cleanup txn records
2      [async] closedts-subscription
1      [async] storage.replicate: processing replica
1      [async] intent_resolver_gc_batcher
1      [async] closedts-rangefeed-subscriber
I190823 06:13:39.259599 8112 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] storage.IntentResolver: cleanup txn records
1      [async] closedts-subscription
1      [async] closedts-rangefeed-subscriber
I190823 06:13:39.260842 8045 storage/replica_command.go:1157  [n2,replicate,s2,r26/3:/Table/53/1/{100-200}] failed to rollback learner n3,s3, abandoning it for the replicate queue: change replicas of r26 failed: fetching current range descriptor value: node unavailable; try another peer
I190823 06:13:39.262922 8077 rpc/nodedialer/nodedialer.go:144  [ct-client] unable to connect to n1: failed to connect to n1 at 127.0.0.1:32821: context canceled
I190823 06:13:39.263951 8062 rpc/nodedialer/nodedialer.go:144  [ct-client] unable to connect to n3: failed to connect to n3 at 127.0.0.1:45631: context canceled
I190823 06:13:39.264515 8061 rpc/nodedialer/nodedialer.go:144  [ct-client] unable to connect to n1: failed to connect to n1 at 127.0.0.1:32821: context canceled
I190823 06:13:39.265071 8111 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] storage.replicate: processing replica
I190823 06:13:39.265199 8078 rpc/nodedialer/nodedialer.go:144  [ct-client] unable to connect to n2: failed to connect to n2 at 127.0.0.1:39115: context canceled
I190823 06:13:39.268348 8045 storage/queue.go:518  [n2,replicate,s2,r26/3:/Table/53/1/{100-200}] rate limited in MaybeAdd (replicate): node unavailable; try another peer
E190823 06:13:39.268823 8045 storage/queue.go:1027  [n2,replicate,s2,r26/3:/Table/53/1/{100-200}] change replicas of r26 failed: fetching current range descriptor value: node unavailable; try another peer
--- FAIL: TestBackupRestorePartitioned (7.76s)
    backup_test.go:244: error executing 'ALTER TABLE data.bank EXPERIMENTAL_RELOCATE VALUES (ARRAY[2], 100)': pq: change replicas of r26 failed: descriptor changed: [expected] r26:/Table/53/1/{100-200} [(n1,s1):1, (n2,s2):3, next=4, gen=10] != [actual] r26:/Table/53/1/{100-200} [(n1,s1):1, (n2,s2):3, (n3,s3):4LEARNER, next=5, gen=11]

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/497167b1c596eda2b70bed91c51ebf39b4356c33

Parameters:

TAGS=
GOFLAGS=-parallel=4

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=TestBackupRestorePartitioned PKG=github.com/cockroachdb/cockroach/pkg/ccl/backupccl TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1453126&tab=buildLog

2      [async] closedts-subscription
14     [async] storage.IntentResolver: cleanup txn records
1      [async] storage.RaftTransport: processing snapshot
1      [async] intent_resolver_ir_batcher
1      [async] intent_resolver_gc_batcher
1      [async] closedts-rangefeed-subscriber
I190825 05:19:10.260603 7788 util/stop/stopper.go:542  quiescing; tasks left:
2      [async] closedts-subscription
1      [async] storage.RaftTransport: processing snapshot
1      [async] closedts-rangefeed-subscriber
I190825 05:19:10.260853 7790 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] storage.replicate: processing replica
1      [async] closedts-subscription
1      [async] closedts-rangefeed-subscriber
I190825 05:19:10.260937 7790 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] storage.replicate: processing replica
1      [async] closedts-rangefeed-subscriber
I190825 05:19:10.260978 7790 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] storage.replicate: processing replica
I190825 05:19:10.261078 7789 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] closedts-subscription
1      [async] closedts-rangefeed-subscriber
I190825 05:19:10.261124 7789 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] closedts-subscription
I190825 05:19:10.261292 7788 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] storage.RaftTransport: processing snapshot
1      [async] closedts-subscription
1      [async] closedts-rangefeed-subscriber
I190825 05:19:10.261343 7788 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] storage.RaftTransport: processing snapshot
1      [async] closedts-rangefeed-subscriber
I190825 05:19:10.261398 7788 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] storage.RaftTransport: processing snapshot
W190825 05:19:10.262082 17 storage/engine/rocksdb.go:116  [rocksdb] [db/version_set.cc:3086] More existing levels in DB than needed. max_bytes_for_level_multiplier may not be guaranteed.
W190825 05:19:10.262701 7755 storage/engine/rocksdb.go:116  [rocksdb] [db/version_set.cc:3086] More existing levels in DB than needed. max_bytes_for_level_multiplier may not be guaranteed.
W190825 05:19:10.263818 7755 storage/engine/rocksdb.go:116  [rocksdb] [db/version_set.cc:3086] More existing levels in DB than needed. max_bytes_for_level_multiplier may not be guaranteed.
I190825 05:19:10.264263 7755 storage/replica_raftstorage.go:829  [n1,s1,r27/4:/Table/53/1/{200-300}] applied LEARNER snapshot [total=14ms ingestion=4@14ms id=4ddd8dbf index=21]
W190825 05:19:10.278710 17 storage/engine/rocksdb.go:116  [rocksdb] [db/version_set.cc:3086] More existing levels in DB than needed. max_bytes_for_level_multiplier may not be guaranteed.
I190825 05:19:10.294169 7768 rpc/nodedialer/nodedialer.go:144  [ct-client] unable to connect to n2: failed to connect to n2 at 127.0.0.1:42671: context canceled
I190825 05:19:10.299135 7711 rpc/nodedialer/nodedialer.go:144  [ct-client] unable to connect to n1: failed to connect to n1 at 127.0.0.1:42741: context canceled
I190825 05:19:10.299320 7712 rpc/nodedialer/nodedialer.go:144  [ct-client] unable to connect to n2: failed to connect to n2 at 127.0.0.1:42671: context canceled
I190825 05:19:10.301172 7782 storage/replica_command.go:949  [n3,replicate,s3,r27/3:/Table/53/1/{200-300}] could not promote [] to voter, rolling back: change replicas of r27 failed: fetching current range descriptor value: node unavailable; try another peer
E190825 05:19:10.301229 7782 storage/queue.go:1027  [n3,replicate,s3,r27/3:/Table/53/1/{200-300}] change replicas of r27 failed: fetching current range descriptor value: node unavailable; try another peer
--- FAIL: TestBackupRestorePartitioned (1.47s)
    backup_test.go:246: error executing 'ALTER TABLE data.bank EXPERIMENTAL_RELOCATE VALUES (ARRAY[3], 200)': pq: change replicas of r27 failed: descriptor changed: [expected] r27:/Table/53/1/{200-300} [(n3,s3):3, (n2,s2):2, next=4, gen=9] != [actual] r27:/Table/53/1/{200-300} [(n3,s3):3, (n2,s2):2, (n1,s1):4LEARNER, next=5, gen=10]

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/497167b1c596eda2b70bed91c51ebf39b4356c33

Parameters:

TAGS=
GOFLAGS=-race -parallel=2

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=TestBackupRestorePartitioned PKG=github.com/cockroachdb/cockroach/pkg/ccl/backupccl TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1453127&tab=buildLog

W190825 05:33:08.439818 17 storage/engine/rocksdb.go:116  [rocksdb] [db/version_set.cc:3086] More existing levels in DB than needed. max_bytes_for_level_multiplier may not be guaranteed.
I190825 05:33:08.443342 4297 rpc/nodedialer/nodedialer.go:144  [n1] unable to connect to n3: failed to connect to n3 at 127.0.0.1:33267: context canceled
W190825 05:33:08.444325 5393 gossip/gossip.go:1517  [n3] no incoming or outgoing connections
W190825 05:33:08.445882 8092 storage/intentresolver/intent_resolver.go:610  failed to cleanup transaction intents: failed to resolve intents: result is ambiguous (server shutdown)
I190825 05:33:08.448049 8086 storage/replica_command.go:949  [n2,replicate,s2,r26/3:/Table/53/1/{100-200}] could not promote [n3,s3] to voter, rolling back: change replicas of r26 failed: fetching current range descriptor value: node unavailable; try another peer
I190825 05:33:08.451646 8112 util/stop/stopper.go:542  quiescing; tasks left:
2      [async] storage.IntentResolver: cleanup txn records
2      [async] closedts-subscription
1      storage.IntentResolver: processing intents
1      [async] storage.replicate: processing replica
1      [async] closedts-rangefeed-subscriber
W190825 05:33:08.455031 6100 storage/raft_transport.go:620  [n2] while processing outgoing Raft queue to node 1: EOF:
I190825 05:33:08.455619 4297 rpc/nodedialer/nodedialer.go:144  [n1] unable to connect to n2: failed to connect to n2 at 127.0.0.1:34123: context canceled
W190825 05:33:08.455985 5407 storage/raft_transport.go:620  [n2] while processing outgoing Raft queue to node 1: EOF:
I190825 05:33:08.457482 8086 storage/replica_command.go:1157  [n2,replicate,s2,r26/3:/Table/53/1/{100-200}] failed to rollback learner n3,s3, abandoning it for the replicate queue: change replicas of r26 failed: fetching current range descriptor value: node unavailable; try another peer
W190825 05:33:08.460788 17 storage/engine/rocksdb.go:116  [rocksdb] [db/version_set.cc:3086] More existing levels in DB than needed. max_bytes_for_level_multiplier may not be guaranteed.
W190825 05:33:08.460915 7488 storage/raft_transport.go:620  [n3] while processing outgoing Raft queue to node 2: rpc error: code = Canceled desc = grpc: the client connection is closing:
I190825 05:33:08.461786 8112 util/stop/stopper.go:542  quiescing; tasks left:
2      [async] closedts-subscription
1      [async] storage.replicate: processing replica
1      [async] closedts-rangefeed-subscriber
I190825 05:33:08.464220 8111 util/stop/stopper.go:542  quiescing; tasks left:
2      [async] closedts-subscription
1      [async] closedts-rangefeed-subscriber
W190825 05:33:08.467805 5075 gossip/gossip.go:1517  [n2] no incoming or outgoing connections
I190825 05:33:08.470568 8086 storage/queue.go:518  [n2,replicate,s2,r26/3:/Table/53/1/{100-200}] rate limited in MaybeAdd (replicate): node unavailable; try another peer
E190825 05:33:08.471038 8086 storage/queue.go:1027  [n2,replicate,s2,r26/3:/Table/53/1/{100-200}] change replicas of r26 failed: fetching current range descriptor value: node unavailable; try another peer
I190825 05:33:08.472034 8112 util/stop/stopper.go:542  quiescing; tasks left:
2      [async] closedts-subscription
E190825 05:33:08.472877 8080 vendor/google.golang.org/grpc/pickfirst.go:61  pickfirstBalancer: failed to NewSubConn: rpc error: code = Canceled desc = grpc: the client connection is closing
I190825 05:33:08.513429 5537 rpc/nodedialer/nodedialer.go:144  [n3] unable to connect to n1: failed to connect to n1 at 127.0.0.1:45775: context canceled
I190825 05:33:08.514510 5537 rpc/nodedialer/nodedialer.go:144  [n3] unable to connect to n2: failed to connect to n2 at 127.0.0.1:34123: context canceled
--- FAIL: TestBackupRestorePartitioned (7.81s)
    backup_test.go:245: error executing 'ALTER TABLE data.bank EXPERIMENTAL_RELOCATE VALUES (ARRAY[2], 100)': pq: change replicas of r26 failed: descriptor changed: [expected] r26:/Table/53/1/{100-200} [(n1,s1):1, (n2,s2):3, next=4, gen=10] != [actual] r26:/Table/53/1/{100-200} [(n1,s1):1, (n2,s2):3, (n3,s3):4LEARNER, next=5, gen=11]

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/b852184c543e0833837f232eb58ad2858e3fc0bf

Parameters:

TAGS=
GOFLAGS=-race -parallel=2

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=TestBackupRestorePartitioned PKG=github.com/cockroachdb/cockroach/pkg/ccl/backupccl TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1454915&tab=buildLog

W190827 05:58:13.000704 5842 storage/raft_transport.go:620  [n1] while processing outgoing Raft queue to node 3: rpc error: code = Canceled desc = grpc: the client connection is closing:
W190827 05:58:13.001434 5583 storage/raft_transport.go:620  [n1] while processing outgoing Raft queue to node 3: rpc error: code = Canceled desc = grpc: the client connection is closing:
W190827 05:58:13.001653 8075 storage/store.go:1735  [n1,s1,r25/4:/Table/53/1/{100-200}] unable to gossip on capacity change: node unavailable; try another peer
I190827 05:58:13.002921 8075 storage/replica_raftstorage.go:814  [n1,s1,r25/4:/Table/53/1/{100-200}] applied LEARNER snapshot [total=29ms ingestion=4@26ms id=7cf605d6 index=22]
I190827 05:58:13.004254 8147 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] storage.replicate: processing replica
1      [async] storage.IntentResolver: cleanup txn records
1      [async] closedts-subscription
1      [async] closedts-rangefeed-subscriber
I190827 05:58:13.004400 8066 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] transport racer
1      [async] storage.RaftTransport: processing snapshot
1      [async] closedts-subscription
1      [async] closedts-rangefeed-subscriber
I190827 05:58:13.004586 8075 rpc/nodedialer/nodedialer.go:144  [n1] unable to connect to n2: failed to connect to n2 at 127.0.0.1:41047: context canceled
I190827 05:58:13.004832 8147 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] storage.replicate: processing replica
1      [async] closedts-subscription
1      [async] closedts-rangefeed-subscriber
I190827 05:58:13.005119 8066 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] transport racer
1      [async] storage.RaftTransport: processing snapshot
W190827 05:58:13.005926 7432 storage/raft_transport.go:620  [n3] while processing outgoing Raft queue to node 2: EOF:
I190827 05:58:13.006207 8066 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] transport racer
W190827 05:58:13.009255 5582 storage/raft_transport.go:620  [n3] while processing outgoing Raft queue to node 1: EOF:
I190827 05:58:13.015131 8147 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] storage.replicate: processing replica
1      [async] closedts-rangefeed-subscriber
W190827 05:58:13.015318 7660 storage/raft_transport.go:620  [n3] while processing outgoing Raft queue to node 2: EOF:
I190827 05:58:13.015758 8147 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] storage.replicate: processing replica
I190827 05:58:13.022549 8060 storage/replica_command.go:982  [n2,replicate,s2,r25/2:/Table/53/1/{100-200}] could not promote [] to voter, rolling back: change replicas of r25 failed: fetching current range descriptor value: node unavailable; try another peer
E190827 05:58:13.022879 8060 storage/queue.go:1027  [n2,replicate,s2,r25/2:/Table/53/1/{100-200}] change replicas of r25 failed: fetching current range descriptor value: node unavailable; try another peer
I190827 05:58:13.023073 5217 rpc/nodedialer/nodedialer.go:144  [n2] unable to connect to n1: failed to connect to n1 at 127.0.0.1:44173: context canceled
I190827 05:58:13.027786 8149 rpc/nodedialer/nodedialer.go:144  [ct-client] unable to connect to n1: failed to connect to n1 at 127.0.0.1:44173: context canceled
I190827 05:58:13.030102 8163 rpc/nodedialer/nodedialer.go:144  [ct-client] unable to connect to n3: failed to connect to n3 at 127.0.0.1:34143: context canceled
W190827 05:58:13.068712 4352 gossip/gossip.go:1517  [n1] no incoming or outgoing connections
I190827 05:58:13.086358 4455 kv/transport_race.go:108  transport race promotion: ran 92 iterations on up to 1690 requests
--- FAIL: TestBackupRestorePartitioned (7.16s)
    backup_test.go:245: error executing 'ALTER TABLE data.bank EXPERIMENTAL_RELOCATE VALUES (ARRAY[2], 100)': pq: change replicas of r25 failed: descriptor changed: [expected] r25:/Table/53/1/{100-200} [(n3,s3):3, (n2,s2):2, next=4, gen=10] != [actual] r25:/Table/53/1/{100-200} [(n3,s3):3, (n2,s2):2, (n1,s1):4LEARNER, next=5, gen=11]

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/7be1e524888cebeafda94858c073cd796b13b429

Parameters:

TAGS=
GOFLAGS=-race -parallel=2

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=TestBackupRestorePartitioned PKG=github.com/cockroachdb/cockroach/pkg/ccl/backupccl TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1457482&tab=buildLog

W190828 06:05:19.823049 7848 storage/raft_transport.go:620  [n2] while processing outgoing Raft queue to node 3: EOF:
I190828 06:05:19.827854 7970 storage/replica_command.go:969  [n2,replicate,s2,r26/2:/Table/53/1/{100-200}] could not promote [n1,s1] to voter, rolling back: change replicas of r26 failed: fetching current range descriptor value: node unavailable; try another peer
I190828 06:05:19.829743 7970 storage/replica_command.go:1197  [n2,replicate,s2,r26/2:/Table/53/1/{100-200}] failed to rollback learner n1,s1, abandoning it for the replicate queue: change replicas of r26 failed: fetching current range descriptor value: node unavailable; try another peer
I190828 06:05:19.829898 8091 util/stop/stopper.go:542  quiescing; tasks left:
3      [async] storage.IntentResolver: cleanup txn records
2      [async] closedts-subscription
1      [async] storage.replicate: processing replica
1      [async] intent_resolver_ir_batcher
1      [async] intent_resolver_gc_batcher
1      [async] closedts-rangefeed-subscriber
W190828 06:05:19.829885 7653 storage/raft_transport.go:620  [n2] while processing outgoing Raft queue to node 3: EOF:
I190828 06:05:19.830725 4483 storage/queue.go:518  [n1,s1] rate limited in MaybeAdd (gc): node unavailable; try another peer
I190828 06:05:19.831097 4483 storage/queue.go:518  [n1,s1] rate limited in MaybeAdd (merge): node unavailable; try another peer
I190828 06:05:19.831407 4483 storage/queue.go:518  [n1,s1] rate limited in MaybeAdd (split): node unavailable; try another peer
I190828 06:05:19.831580 4483 storage/queue.go:518  [n1,s1] rate limited in MaybeAdd (replicate): node unavailable; try another peer
I190828 06:05:19.831709 4483 storage/queue.go:518  [n1,s1] rate limited in MaybeAdd (replicaGC): node unavailable; try another peer
I190828 06:05:19.831802 4483 storage/queue.go:518  [n1,s1] rate limited in MaybeAdd (raftlog): node unavailable; try another peer
I190828 06:05:19.831930 4483 storage/queue.go:518  [n1,s1] rate limited in MaybeAdd (raftsnapshot): node unavailable; try another peer
I190828 06:05:19.832062 4483 storage/queue.go:518  [n1,s1] rate limited in MaybeAdd (consistencyChecker): node unavailable; try another peer
I190828 06:05:19.832250 4483 storage/queue.go:518  [n1,s1] rate limited in MaybeAdd (timeSeriesMaintenance): node unavailable; try another peer
I190828 06:05:19.833240 8092 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] closedts-subscription
1      [async] closedts-rangefeed-subscriber
W190828 06:05:19.834159 7434 storage/raft_transport.go:620  [n3] while processing outgoing Raft queue to node 2: rpc error: code = Unavailable desc = transport is closing:
W190828 06:05:19.839080 7682 storage/raft_transport.go:620  [n3] while processing outgoing Raft queue to node 2: rpc error: code = Unavailable desc = transport is closing:
I190828 06:05:19.839790 7970 storage/queue.go:518  [n2,replicate,s2,r26/2:/Table/53/1/{100-200}] rate limited in MaybeAdd (replicate): node unavailable; try another peer
E190828 06:05:19.840241 7970 storage/queue.go:1027  [n2,replicate,s2,r26/2:/Table/53/1/{100-200}] change replicas of r26 failed: fetching current range descriptor value: node unavailable; try another peer
I190828 06:05:19.842651 8090 util/stop/stopper.go:542  quiescing; tasks left:
2      [async] closedts-subscription
I190828 06:05:19.843253 8090 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] closedts-subscription
W190828 06:05:19.844686 5409 gossip/gossip.go:1517  [n3] no incoming or outgoing connections
I190828 06:05:19.844776 8091 util/stop/stopper.go:542  quiescing; tasks left:
2      [async] closedts-subscription
I190828 06:05:19.846945 8091 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] closedts-subscription
--- FAIL: TestBackupRestorePartitioned (6.22s)
    backup_test.go:245: error executing 'ALTER TABLE data.bank EXPERIMENTAL_RELOCATE VALUES (ARRAY[2], 100)': pq: change replicas of r26 failed: descriptor changed: [expected] r26:/Table/53/1/{100-200} [(n3,s3):3, (n2,s2):2, next=4, gen=10] != [actual] r26:/Table/53/1/{100-200} [(n3,s3):3, (n2,s2):2, (n1,s1):4LEARNER, next=5, gen=11]

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/a4b16706455609075c07ddd382907acc50be3c21

Parameters:

TAGS=
GOFLAGS=-race -parallel=2

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=TestBackupRestorePartitioned PKG=github.com/cockroachdb/cockroach/pkg/ccl/backupccl TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1459662&tab=buildLog

W190829 05:32:57.853745 7775 storage/raft_transport.go:620  [n2] while processing outgoing Raft queue to node 3: rpc error: code = Unavailable desc = transport is closing:
I190829 05:32:57.848205 8017 storage/queue.go:518  [n2,replicate,s2,r26/2:/Table/53/1/{100-200}] rate limited in MaybeAdd (replicate): node unavailable; try another peer
I190829 05:32:57.854841 8115 util/stop/stopper.go:542  quiescing; tasks left:
2      [async] closedts-subscription
1      node.Node: batch
1      [async] transport racer
1      [async] storage.consistencyChecker: processing replica
I190829 05:32:57.857587 8116 util/stop/stopper.go:542  quiescing; tasks left:
2      [async] closedts-subscription
1      [async] storage.replicate: processing replica
1      [async] closedts-rangefeed-subscriber
E190829 05:32:57.859534 8017 storage/queue.go:1027  [n2,replicate,s2,r26/2:/Table/53/1/{100-200}] change replicas of r26 failed: fetching current range descriptor value: node unavailable; try another peer
I190829 05:32:57.860469 8116 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] storage.replicate: processing replica
I190829 05:32:57.861638 8115 util/stop/stopper.go:542  quiescing; tasks left:
1      node.Node: batch
1      [async] transport racer
1      [async] storage.consistencyChecker: processing replica
I190829 05:32:57.864713 4469 rpc/nodedialer/nodedialer.go:144  [n1] unable to connect to n2: failed to connect to n2 at 127.0.0.1:32977: context canceled
I190829 05:32:57.865919 4469 rpc/nodedialer/nodedialer.go:144  [n1] unable to connect to n3: failed to connect to n3 at 127.0.0.1:45487: context canceled
E190829 05:32:57.873804 8105 storage/consistency_queue.go:133  [n1,consistencyChecker,s1,r4/1:/System{/tsd-tse}] result is ambiguous (server shutdown)
E190829 05:32:57.874165 8105 storage/queue.go:1027  [n1,consistencyChecker,s1,r4/1:/System{/tsd-tse}] result is ambiguous (server shutdown)
I190829 05:32:57.874456 8115 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] transport racer
E190829 05:32:57.875027 8213 vendor/google.golang.org/grpc/pickfirst.go:61  pickfirstBalancer: failed to NewSubConn: rpc error: code = Canceled desc = grpc: the client connection is closing
I190829 05:32:57.875214 4423 rpc/nodedialer/nodedialer.go:144  [n1] unable to connect to n2: failed to connect to n2 at 127.0.0.1:32977: context canceled
I190829 05:32:57.876051 4423 rpc/nodedialer/nodedialer.go:144  [n1] unable to connect to n3: failed to connect to n3 at 127.0.0.1:45487: context canceled
E190829 05:32:57.880081 8219 vendor/google.golang.org/grpc/pickfirst.go:61  pickfirstBalancer: failed to NewSubConn: rpc error: code = Canceled desc = grpc: the client connection is closing
I190829 05:32:57.892814 8140 rpc/nodedialer/nodedialer.go:144  [ct-client] unable to connect to n2: failed to connect to n2 at 127.0.0.1:32977: context canceled
E190829 05:32:57.893571 8221 vendor/google.golang.org/grpc/pickfirst.go:61  pickfirstBalancer: failed to NewSubConn: rpc error: code = Canceled desc = grpc: the client connection is closing
I190829 05:32:57.893767 8141 rpc/nodedialer/nodedialer.go:144  [ct-client] unable to connect to n1: failed to connect to n1 at 127.0.0.1:41683: context canceled
I190829 05:32:57.927249 4349 kv/transport_race.go:108  transport race promotion: ran 24 iterations on up to 1734 requests
I190829 05:32:57.933425 8173 rpc/nodedialer/nodedialer.go:144  [ct-client] unable to connect to n1: failed to connect to n1 at 127.0.0.1:41683: context canceled
I190829 05:32:57.934179 8174 rpc/nodedialer/nodedialer.go:144  [ct-client] unable to connect to n3: failed to connect to n3 at 127.0.0.1:45487: context canceled
--- FAIL: TestBackupRestorePartitioned (7.56s)
    backup_test.go:245: error executing 'ALTER TABLE data.bank EXPERIMENTAL_RELOCATE VALUES (ARRAY[2], 100)': pq: change replicas of r26 failed: descriptor changed: [expected] r26:/Table/53/1/{100-200} [(n3,s3):3, (n2,s2):2, next=4, gen=10] != [actual] r26:/Table/53/1/{100-200} [(n3,s3):3, (n2,s2):2, (n1,s1):4LEARNER, next=5, gen=11]

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/66bd279c9aa682c2b7adcec87ec0c639b8039a33

Parameters:

TAGS=
GOFLAGS=-race -parallel=2

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=TestBackupRestorePartitioned PKG=github.com/cockroachdb/cockroach/pkg/ccl/backupccl TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1461688&tab=buildLog

I190830 06:03:57.605262 8123 storage/replica_command.go:1213  [n2,replicate,s2,r25/3:/Table/53/1/{100-200}] failed to rollback learner n1,s1, abandoning it for the replicate queue: change replicas of r25 failed: fetching current range descriptor value: node unavailable; try another peer
I190830 06:03:57.605393 8123 storage/queue.go:518  [n2,replicate,s2,r25/3:/Table/53/1/{100-200}] rate limited in MaybeAdd (replicate): node unavailable; try another peer
W190830 06:03:57.605592 7349 storage/raft_transport.go:620  [n3] while processing outgoing Raft queue to node 2: rpc error: code = Canceled desc = grpc: the client connection is closing:
E190830 06:03:57.605623 8123 storage/queue.go:1027  [n2,replicate,s2,r25/3:/Table/53/1/{100-200}] change replicas of r25 failed: fetching current range descriptor value: node unavailable; try another peer
W190830 06:03:57.606619 5697 storage/raft_transport.go:620  [n3] while processing outgoing Raft queue to node 1: rpc error: code = Canceled desc = grpc: the client connection is closing:
W190830 06:03:57.608425 5862 storage/raft_transport.go:620  [n1] while processing outgoing Raft queue to node 3: EOF:
W190830 06:03:57.609605 5639 storage/raft_transport.go:620  [n1] while processing outgoing Raft queue to node 2: EOF:
W190830 06:03:57.610608 7784 storage/raft_transport.go:620  [n2] while processing outgoing Raft queue to node 3: EOF:
W190830 06:03:57.611439 8031 storage/intentresolver/intent_resolver.go:818  failed to gc transaction record: could not GC completed transaction anchored at /Local/Range/Table/53/RangeDescriptor: node unavailable; try another peer
I190830 06:03:57.613149 8126 util/stop/stopper.go:542  quiescing; tasks left:
2      [async] closedts-subscription
1      [async] storage.IntentResolver: cleanup txn records
1      [async] closedts-rangefeed-subscriber
W190830 06:03:57.615367 5798 storage/raft_transport.go:620  [n1] while processing outgoing Raft queue to node 3: EOF:
W190830 06:03:57.616939 5941 storage/raft_transport.go:620  [n3] while processing outgoing Raft queue to node 1: rpc error: code = Canceled desc = grpc: the client connection is closing:
I190830 06:03:57.617107 8125 util/stop/stopper.go:542  quiescing; tasks left:
2      [async] closedts-subscription
1      [async] closedts-rangefeed-subscriber
W190830 06:03:57.617446 7695 storage/raft_transport.go:620  [n3] while processing outgoing Raft queue to node 2: rpc error: code = Canceled desc = grpc: the client connection is closing:
W190830 06:03:57.619284 6037 storage/raft_transport.go:620  [n2] while processing outgoing Raft queue to node 1: EOF:
W190830 06:03:57.621657 5793 storage/raft_transport.go:620  [n1] while processing outgoing Raft queue to node 2: rpc error: code = Canceled desc = grpc: the client connection is closing:
I190830 06:03:57.624869 8126 util/stop/stopper.go:542  quiescing; tasks left:
2      [async] closedts-subscription
1      [async] closedts-rangefeed-subscriber
I190830 06:03:57.628066 8127 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] closedts-subscription
1      [async] closedts-rangefeed-subscriber
I190830 06:03:57.628387 8127 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] closedts-rangefeed-subscriber
I190830 06:03:57.628601 8125 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] closedts-subscription
I190830 06:03:57.631766 8126 util/stop/stopper.go:542  quiescing; tasks left:
2      [async] closedts-subscription
I190830 06:03:57.632112 8126 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] closedts-subscription
--- FAIL: TestBackupRestorePartitioned (6.43s)
    backup_test.go:245: error executing 'ALTER TABLE data.bank EXPERIMENTAL_RELOCATE VALUES (ARRAY[2], 100)': pq: change replicas of r25 failed: descriptor changed: [expected] r25:/Table/53/1/{100-200} [(n2,s2):3, (n3,s3):2, next=4, gen=10] != [actual] r25:/Table/53/1/{100-200} [(n2,s2):3, (n3,s3):2, (n1,s1):4LEARNER, next=5, gen=11]

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/bc3342ea2810c88f88b243491f41af581a54ba10

Parameters:

TAGS=
GOFLAGS=-race -parallel=2

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=TestBackupRestorePartitioned PKG=github.com/cockroachdb/cockroach/pkg/ccl/backupccl TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1462749&tab=buildLog

W190831 05:59:58.992777 5472 gossip/gossip.go:1517  [n3] no incoming or outgoing connections
W190831 05:59:58.994978 5658 storage/raft_transport.go:620  [n1] while processing outgoing Raft queue to node 2: EOF:
W190831 05:59:58.995436 5881 storage/raft_transport.go:620  [n1] while processing outgoing Raft queue to node 2: EOF:
W190831 05:59:58.995498 5501 storage/raft_transport.go:620  [n2] while processing outgoing Raft queue to node 1: rpc error: code = Canceled desc = grpc: the client connection is closing:
I190831 05:59:58.998640 4483 rpc/nodedialer/nodedialer.go:144  [n1] unable to connect to n3: failed to connect to n3 at 127.0.0.1:40043: node unavailable; try another peer
I190831 05:59:58.999801 8234 storage/replica_command.go:979  [n3,replicate,s3,r24/2:/Table/53/1/{200-300}] could not promote [n2,s2] to voter, rolling back: snapshot failed: (n2,s2):4LEARNER: remote failed to apply snapshot: rpc error: code = Unavailable desc = transport is closing
I190831 05:59:59.001080 8234 storage/replica_command.go:1213  [n3,replicate,s3,r24/2:/Table/53/1/{200-300}] failed to rollback learner n2,s2, abandoning it for the replicate queue: change replicas of r24 failed: fetching current range descriptor value: node unavailable; try another peer
I190831 05:59:59.001540 8234 storage/queue.go:518  [n3,replicate,s3,r24/2:/Table/53/1/{200-300}] rate limited in MaybeAdd (replicate): node unavailable; try another peer
I190831 05:59:59.001967 8234 storage/replicate_queue.go:266  [n3,replicate,s3,r24/2:/Table/53/1/{200-300}] snapshot failed: (n2,s2):4LEARNER: remote failed to apply snapshot: rpc error: code = Unavailable desc = transport is closing
W190831 05:59:59.003365 6084 storage/raft_transport.go:620  [n3] while processing outgoing Raft queue to node 1: rpc error: code = Canceled desc = grpc: the client connection is closing:
I190831 05:59:59.003905 8311 util/stop/stopper.go:542  quiescing; tasks left:
2      [async] closedts-subscription
1      [async] storage.replicate: processing replica
1      [async] closedts-rangefeed-subscriber
I190831 05:59:59.004097 4483 rpc/nodedialer/nodedialer.go:144  [n1] unable to connect to n2: failed to connect to n2 at 127.0.0.1:34045: node unavailable; try another peer
W190831 05:59:59.004673 7497 storage/raft_transport.go:620  [n3] while processing outgoing Raft queue to node 2: rpc error: code = Unavailable desc = transport is closing:
I190831 05:59:59.008107 8309 util/stop/stopper.go:542  quiescing; tasks left:
2      [async] closedts-subscription
1      [async] transport racer
I190831 05:59:59.010484 8211 rpc/nodedialer/nodedialer.go:144  [ct-client] unable to connect to n2: failed to connect to n2 at 127.0.0.1:34045: context canceled
I190831 05:59:59.012127 8309 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] transport racer
I190831 05:59:59.013307 8311 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] storage.replicate: processing replica
1      [async] closedts-subscription
1      [async] closedts-rangefeed-subscriber
I190831 05:59:59.013792 8311 util/stop/stopper.go:542  quiescing; tasks left:
1      [async] storage.replicate: processing replica
I190831 05:59:59.057174 8234 storage/replica_command.go:979  [n3,replicate,s3,r24/2:/Table/53/1/{200-300}] could not promote [] to voter, rolling back: change replicas of r24 failed: fetching current range descriptor value: node unavailable; try another peer
E190831 05:59:59.057643 8234 storage/queue.go:1027  [n3,replicate,s3,r24/2:/Table/53/1/{200-300}] change replicas of r24 failed: fetching current range descriptor value: node unavailable; try another peer
I190831 05:59:59.061705 4524 kv/transport_race.go:108  transport race promotion: ran 74 iterations on up to 1780 requests
--- FAIL: TestBackupRestorePartitioned (8.07s)
    backup_test.go:246: error executing 'ALTER TABLE data.bank EXPERIMENTAL_RELOCATE VALUES (ARRAY[3], 200)': pq: change replicas of r24 failed: descriptor changed: [expected] r24:/Table/53/1/{200-300} [(n1,s1):1, (n3,s3):2, next=4, gen=9] != [actual] r24:/Table/53/1/{200-300} [(n1,s1):1, (n3,s3):2, (n2,s2):4LEARNER, next=5, gen=10]

@craig craig bot closed this as completed in b4caac8 Sep 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants