Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ccl/kvccl/kvfollowerreadsccl: TestFollowerReadsWithStaleDescriptor failed #52681

Closed
cockroach-teamcity opened this issue Aug 12, 2020 · 6 comments · Fixed by #53660
Closed

ccl/kvccl/kvfollowerreadsccl: TestFollowerReadsWithStaleDescriptor failed #52681

cockroach-teamcity opened this issue Aug 12, 2020 · 6 comments · Fixed by #53660
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. skipped-test
Milestone

Comments

@cockroach-teamcity
Copy link
Member

(ccl/kvccl/kvfollowerreadsccl).TestFollowerReadsWithStaleDescriptor failed on master@44c362600dbed9d58be8daf64558543745698736:

I200812 12:16:08.463246 3830 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n2: failed to connect to n2 at 127.0.0.1:32951: context canceled
W200812 12:16:08.463635 1185 gossip/gossip.go:1494  [n2] no incoming or outgoing connections
I200812 12:16:08.472175 3814 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n3: failed to connect to n3 at 127.0.0.1:41127: context canceled
I200812 12:16:08.474299 3815 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n4: failed to connect to n4 at 127.0.0.1:45835: context canceled
I200812 12:16:08.474909 3813 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n2: failed to connect to n2 at 127.0.0.1:32951: context canceled
--- FAIL: TestFollowerReadsWithStaleDescriptor (9.78s)
    followerreads_test.go:291: 
        	Error Trace:	followerreads_test.go:291
        	Error:      	Should be false
        	Test:       	TestFollowerReadsWithStaleDescriptor
        	Messages:   	query was not served through follower reads:  9223372036854.777ms      0.000ms    === operation:SELECT * FROM test AS OF SYSTEM TIME follower_read_timestamp() WHERE k=2 client:127.0.0.1:54818 hostssl: node:4 sb:1 unfinished: user:root
        	            	 9223372036854.777ms      0.398ms    event:sql/plan_opt.go:361 [n4,client=127.0.0.1:54818,hostssl,user=root] query cache miss
        	            	 9223372036854.777ms      0.375ms    event:sql/catalog/descs/collection.go:478 [n4,client=127.0.0.1:54818,hostssl,user=root] added descriptor 't.public.test' to collection
        	            	 9223372036854.777ms      0.788ms    event:sql/plan_opt.go:361 [n4,client=127.0.0.1:54818,hostssl,user=root] query cache add
        	            	 9223372036854.777ms      0.197ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:54818,hostssl,user=root] planning ends
        	            	 9223372036854.777ms      0.112ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:54818,hostssl,user=root] checking distributability
        	            	 9223372036854.777ms      0.172ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:54818,hostssl,user=root] will distribute plan: false
        	            	 9223372036854.777ms      0.145ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:54818,hostssl,user=root] execution starts: distributed engine
        	            	 9223372036854.777ms      0.076ms    event:sql/distsql_running.go:981 [n4,client=127.0.0.1:54818,hostssl,user=root] creating DistSQL plan with isLocal=true
        	            	 9223372036854.777ms      0.098ms    event:sql/distsql_running.go:381 [n4,client=127.0.0.1:54818,hostssl,user=root] running DistSQL plan
        	            	 9223372036854.777ms      0.058ms        === operation:flow client:127.0.0.1:54818 hostssl: node:4 user:root
        	            	 9223372036854.777ms      0.291ms        event:sql/flowinfra/flow.go:285 [n4,client=127.0.0.1:54818,hostssl,user=root] starting (0 processors, 0 startables)
        	            	 9223372036854.777ms      0.005ms        === operation:table reader client:127.0.0.1:54818 cockroach.flowid:00000000-0000-0000-0000-000000000000 cockroach.processorid:0 cockroach.stat.tablereader.bytes.read:0 B cockroach.stat.tablereader.input.rows:0 cockroach.stat.tablereader.stalltime:16.029ms hostssl: node:4 user:root
        	            	 9223372036854.777ms      0.128ms        event:sql/rowexec/tablereader.go:181 [n4,client=127.0.0.1:54818,hostssl,user=root] starting scan with limitBatches true
        	            	 9223372036854.777ms      0.272ms        event:sql/row/kv_batch_fetcher.go:289 [n4,client=127.0.0.1:54818,hostssl,user=root] Scan /Table/53/1/2{-/#}
        	            	 9223372036854.777ms      0.048ms            === operation:txn coordinator send client:127.0.0.1:54818 hostssl: node:4 txnID:dd7da9d4-4ba1-411c-8142-fc8febf09ecc user:root
        	            	 9223372036854.777ms      0.082ms                === operation:dist sender send client:127.0.0.1:54818 hostssl: node:4 txn:dd7da9d4 user:root
        	            	 9223372036854.777ms      0.348ms                event:kv/kvclient/kvcoord/range_iter.go:159 [n4,client=127.0.0.1:54818,hostssl,user=root,txn=dd7da9d4] querying next range at /Table/53/1/2
        	            	 9223372036854.777ms      0.535ms                event:kv/kvclient/kvcoord/dist_sender.go:1790 [n4,client=127.0.0.1:54818,hostssl,user=root,txn=dd7da9d4] r35: sending batch 1 Scan to (n3,s3):3
        	            	 9223372036854.777ms      0.565ms                event:rpc/nodedialer/nodedialer.go:132 [n4,client=127.0.0.1:54818,hostssl,user=root,txn=dd7da9d4] sending request to 127.0.0.1:41127
        	            	 9223372036854.777ms      0.165ms                    === operation:/cockroach.roachpb.Internal/Batch component:gRPC node:4 sb:1 span.kind:client txnID:dd7da9d4-4ba1-411c-8142-fc8febf09ecc
        	            	 9223372036854.777ms      2.373ms                    === operation:/cockroach.roachpb.Internal/Batch component:gRPC node:3 sb:1 span.kind:server txnid:dd7da9d4-4ba1-411c-8142-fc8febf09ecc unfinished:
        	            	 9223372036854.777ms      0.309ms                    event:server/node.go:842 [n3] node received request: 1 Scan
        	            	 9223372036854.777ms      0.379ms                    event:kv/kvserver/store_send.go:161 [n3,s3] executing [txn: dd7da9d4], Scan [/Table/53/1/2,/Table/53/1/2/#)
        	            	 9223372036854.777ms      0.439ms                    event:kv/kvserver/replica_send.go:97 [n3,s3,r35/3:/{Table/39-Max}] read-only path
        	            	 9223372036854.777ms      1.071ms                    event:kv/kvserver/replica_follower_read.go:105 [n3,s3,r35/3:/{Table/39-Max}] serving via follower read; query timestamp below closed timestamp by 3.664792525s
        	            	 9223372036854.777ms      0.563ms                    event:kv/kvserver/concurrency/concurrency_manager.go:116 [n3,s3,r35/3:/{Table/39-Max}] sequencing request
        	            	 9223372036854.777ms      0.139ms                    event:kv/kvserver/concurrency/concurrency_manager.go:152 [n3,s3,r35/3:/{Table/39-Max}] acquiring latches
        	            	 9223372036854.777ms      0.241ms                    event:kv/kvserver/concurrency/concurrency_manager.go:164 [n3,s3,r35/3:/{Table/39-Max}] scanning lock table for conflicting locks
        	            	 9223372036854.777ms      0.310ms                    event:kv/kvserver/replica_read.go:124 [n3,s3,r35/3:/{Table/39-Max}] executing read-only batch
        	            	 9223372036854.777ms      2.012ms                    event:kv/kvserver/replica_read.go:109 [n3,s3,r35/3:/{Table/39-Max}] read completed
        	            	 9223372036854.777ms      0.339ms                    event:kv/kvserver/replica_send.go:166 [n3,s3,r35/3:/{Table/39-Max}] client had stale range info; returning an update
        	            	 9223372036854.777ms     12.091ms                event:kv/kvclient/kvcoord/dist_sender.go:1904 [n4,client=127.0.0.1:54818,hostssl,user=root,txn=dd7da9d4] received updated range info: [desc: r35:/{Table/39-Max} [(n1,s1):1, (n3,s3):3, next=4, gen=6] lease: repl=(n1,s1):1 seq=1 start=0,0 epo=1 pro=1597234559.262289754,0]
        	            	 9223372036854.777ms      1.026ms                event:kv/kvclient/kvcoord/range_cache.go:850 [n4,client=127.0.0.1:54818,hostssl,user=root,txn=dd7da9d4] evict cached descriptor: desc=desc:r35:/{Table/39-Max} [(n1,s1):1, (n2,s2):2LEARNER, (n3,s3):3, next=4, gen=5], lease:repl=(n1,s1):1 seq=1 start=0,0 epo=1 pro=1597234559.262289754,0
        	            	 9223372036854.777ms      0.128ms                event:kv/kvclient/kvcoord/range_cache.go:481 [n4,client=127.0.0.1:54818,hostssl,user=root,txn=dd7da9d4] evicting cached range descriptor with 1 replacements
        	            	 9223372036854.777ms      0.448ms                event:kv/kvclient/kvcoord/range_cache.go:1026 [n4,client=127.0.0.1:54818,hostssl,user=root,txn=dd7da9d4] clearing entries overlapping r35:/{Table/39-Max} [(n1,s1):1, (n3,s3):3, next=4, gen=6]
        	            	 9223372036854.777ms     15.664ms            event:kv/kvclient/kvcoord/txn_interceptor_span_refresher.go:425 [n4,client=127.0.0.1:54818,hostssl,user=root,txn=dd7da9d4] recording span to refresh: /Table/53/1/2{-/#}
        	            	 9223372036854.777ms     17.843ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:54818,hostssl,user=root] execution ends
        	            	 9223372036854.777ms      0.100ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:54818,hostssl,user=root] rows affected: 0
        	            	 9223372036854.777ms      0.460ms    event:sql/conn_executor_exec.go:1392 [n4,client=127.0.0.1:54818,hostssl,user=root] AutoCommit. err: <nil>

More

Parameters:

  • GOFLAGS=-json
make stressrace TESTS=TestFollowerReadsWithStaleDescriptor PKG=./pkg/ccl/kvccl/kvfollowerreadsccl TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Aug 12, 2020
@cockroach-teamcity cockroach-teamcity added this to the 20.2 milestone Aug 12, 2020
@cockroach-teamcity
Copy link
Member Author

(ccl/kvccl/kvfollowerreadsccl).TestFollowerReadsWithStaleDescriptor failed on master@ed56390d4f28f7d7b26ce047bcb799afb337b537:

W200812 15:22:04.466196 2241 sql/sqlliveness/slinstance/slinstance.go:185  [n4] exiting heartbeat loop
I200812 15:22:04.478067 3713 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n1: failed to connect to n1 at 127.0.0.1:39913: context canceled
I200812 15:22:04.478237 3778 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n4: failed to connect to n4 at 127.0.0.1:39689: context canceled
I200812 15:22:04.478371 3712 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n2: failed to connect to n2 at 127.0.0.1:36145: context canceled
I200812 15:22:04.516279 3769 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n3: failed to connect to n3 at 127.0.0.1:39641: context canceled
I200812 15:22:04.518415 3768 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n2: failed to connect to n2 at 127.0.0.1:36145: context canceled
--- FAIL: TestFollowerReadsWithStaleDescriptor (8.96s)
    followerreads_test.go:291: 
        	Error Trace:	followerreads_test.go:291
        	Error:      	Should be false
        	Test:       	TestFollowerReadsWithStaleDescriptor
        	Messages:   	query was not served through follower reads:  9223372036854.777ms      0.000ms    === operation:SELECT * FROM test AS OF SYSTEM TIME follower_read_timestamp() WHERE k=2 client:127.0.0.1:39336 hostssl: node:4 sb:1 unfinished: user:root
        	            	 9223372036854.777ms      0.434ms    event:sql/plan_opt.go:361 [n4,client=127.0.0.1:39336,hostssl,user=root] query cache miss
        	            	 9223372036854.777ms      0.510ms    event:sql/catalog/descs/collection.go:478 [n4,client=127.0.0.1:39336,hostssl,user=root] added descriptor 't.public.test' to collection
        	            	 9223372036854.777ms      1.074ms    event:sql/plan_opt.go:361 [n4,client=127.0.0.1:39336,hostssl,user=root] query cache add
        	            	 9223372036854.777ms      0.148ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:39336,hostssl,user=root] planning ends
        	            	 9223372036854.777ms      0.153ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:39336,hostssl,user=root] checking distributability
        	            	 9223372036854.777ms      0.081ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:39336,hostssl,user=root] will distribute plan: false
        	            	 9223372036854.777ms      0.076ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:39336,hostssl,user=root] execution starts: distributed engine
        	            	 9223372036854.777ms      0.076ms    event:sql/distsql_running.go:981 [n4,client=127.0.0.1:39336,hostssl,user=root] creating DistSQL plan with isLocal=true
        	            	 9223372036854.777ms      0.136ms    event:sql/distsql_running.go:381 [n4,client=127.0.0.1:39336,hostssl,user=root] running DistSQL plan
        	            	 9223372036854.777ms      0.065ms        === operation:flow client:127.0.0.1:39336 hostssl: node:4 user:root
        	            	 9223372036854.777ms      0.805ms        event:sql/flowinfra/flow.go:285 [n4,client=127.0.0.1:39336,hostssl,user=root] starting (0 processors, 0 startables)
        	            	 9223372036854.777ms      0.006ms        === operation:table reader client:127.0.0.1:39336 cockroach.flowid:00000000-0000-0000-0000-000000000000 cockroach.processorid:0 cockroach.stat.tablereader.bytes.read:0 B cockroach.stat.tablereader.input.rows:0 cockroach.stat.tablereader.stalltime:23.517ms hostssl: node:4 user:root
        	            	 9223372036854.777ms      0.072ms        event:sql/rowexec/tablereader.go:181 [n4,client=127.0.0.1:39336,hostssl,user=root] starting scan with limitBatches true
        	            	 9223372036854.777ms      0.080ms        event:sql/row/kv_batch_fetcher.go:289 [n4,client=127.0.0.1:39336,hostssl,user=root] Scan /Table/53/1/2{-/#}
        	            	 9223372036854.777ms      0.034ms            === operation:txn coordinator send client:127.0.0.1:39336 hostssl: node:4 txnID:2b23efc2-3779-4d7d-8990-b62c224f6235 user:root
        	            	 9223372036854.777ms      0.191ms                === operation:dist sender send client:127.0.0.1:39336 hostssl: node:4 txn:2b23efc2 user:root
        	            	 9223372036854.777ms      0.261ms                event:kv/kvclient/kvcoord/range_iter.go:159 [n4,client=127.0.0.1:39336,hostssl,user=root,txn=2b23efc2] querying next range at /Table/53/1/2
        	            	 9223372036854.777ms      0.565ms                event:kv/kvclient/kvcoord/dist_sender.go:1790 [n4,client=127.0.0.1:39336,hostssl,user=root,txn=2b23efc2] r35: sending batch 1 Scan to (n3,s3):3
        	            	 9223372036854.777ms      0.214ms                event:rpc/nodedialer/nodedialer.go:132 [n4,client=127.0.0.1:39336,hostssl,user=root,txn=2b23efc2] sending request to 127.0.0.1:39641
        	            	 9223372036854.777ms      0.187ms                    === operation:/cockroach.roachpb.Internal/Batch component:gRPC node:4 sb:1 span.kind:client txnID:2b23efc2-3779-4d7d-8990-b62c224f6235
        	            	 9223372036854.777ms     10.571ms                    === operation:/cockroach.roachpb.Internal/Batch component:gRPC node:3 sb:1 span.kind:server txnid:2b23efc2-3779-4d7d-8990-b62c224f6235 unfinished:
        	            	 9223372036854.777ms      0.352ms                    event:server/node.go:842 [n3] node received request: 1 Scan
        	            	 9223372036854.777ms      0.297ms                    event:kv/kvserver/store_send.go:161 [n3,s3] executing [txn: 2b23efc2], Scan [/Table/53/1/2,/Table/53/1/2/#)
        	            	 9223372036854.777ms      0.459ms                    event:kv/kvserver/replica_send.go:97 [n3,s3,r35/3:/{Table/39-Max}] read-only path
        	            	 9223372036854.777ms      0.888ms                    event:kv/kvserver/replica_follower_read.go:105 [n3,s3,r35/3:/{Table/39-Max}] serving via follower read; query timestamp below closed timestamp by 3.778523391s
        	            	 9223372036854.777ms      0.203ms                    event:kv/kvserver/concurrency/concurrency_manager.go:116 [n3,s3,r35/3:/{Table/39-Max}] sequencing request
        	            	 9223372036854.777ms      0.119ms                    event:kv/kvserver/concurrency/concurrency_manager.go:152 [n3,s3,r35/3:/{Table/39-Max}] acquiring latches
        	            	 9223372036854.777ms      0.172ms                    event:kv/kvserver/concurrency/concurrency_manager.go:164 [n3,s3,r35/3:/{Table/39-Max}] scanning lock table for conflicting locks
        	            	 9223372036854.777ms      0.258ms                    event:kv/kvserver/replica_read.go:124 [n3,s3,r35/3:/{Table/39-Max}] executing read-only batch
        	            	 9223372036854.777ms      2.022ms                    event:kv/kvserver/replica_read.go:109 [n3,s3,r35/3:/{Table/39-Max}] read completed
        	            	 9223372036854.777ms      0.251ms                    event:kv/kvserver/replica_send.go:166 [n3,s3,r35/3:/{Table/39-Max}] client had stale range info; returning an update
        	            	 9223372036854.777ms     20.746ms                event:kv/kvclient/kvcoord/dist_sender.go:1904 [n4,client=127.0.0.1:39336,hostssl,user=root,txn=2b23efc2] received updated range info: [desc: r35:/{Table/39-Max} [(n1,s1):1, (n3,s3):3, next=4, gen=6] lease: repl=(n1,s1):1 seq=1 start=0,0 epo=1 pro=1597245715.932408912,0]
        	            	 9223372036854.777ms      0.124ms                event:kv/kvclient/kvcoord/range_cache.go:481 [n4,client=127.0.0.1:39336,hostssl,user=root,txn=2b23efc2] evicting cached range descriptor with 1 replacements
        	            	 9223372036854.777ms      0.573ms                event:kv/kvclient/kvcoord/range_cache.go:1026 [n4,client=127.0.0.1:39336,hostssl,user=root,txn=2b23efc2] clearing entries overlapping r35:/{Table/39-Max} [(n1,s1):1, (n3,s3):3, next=4, gen=6]
        	            	 9223372036854.777ms     23.230ms            event:kv/kvclient/kvcoord/txn_interceptor_span_refresher.go:425 [n4,client=127.0.0.1:39336,hostssl,user=root,txn=2b23efc2] recording span to refresh: /Table/53/1/2{-/#}
        	            	 9223372036854.777ms     25.830ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:39336,hostssl,user=root] execution ends
        	            	 9223372036854.777ms      0.142ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:39336,hostssl,user=root] rows affected: 0
        	            	 9223372036854.777ms      0.452ms    event:sql/conn_executor_exec.go:1392 [n4,client=127.0.0.1:39336,hostssl,user=root] AutoCommit. err: <nil>

More

Parameters:

  • GOFLAGS=-json
make stressrace TESTS=TestFollowerReadsWithStaleDescriptor PKG=./pkg/ccl/kvccl/kvfollowerreadsccl TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(ccl/kvccl/kvfollowerreadsccl).TestFollowerReadsWithStaleDescriptor failed on master@09bb6460c0ed1a882db95472e2d962dd57a80aaf:

W200812 15:15:20.503623 2788 kv/kvserver/raft_transport.go:637  [n2] while processing outgoing Raft queue to node 1: rpc error: code = Canceled desc = grpc: the client connection is closing:
W200812 15:15:20.504870 2988 kv/kvserver/raft_transport.go:637  [n1] while processing outgoing Raft queue to node 2: EOF:
W200812 15:15:20.506215 223 sql/sqlliveness/slinstance/slinstance.go:185  [n1] exiting heartbeat loop
W200812 15:15:20.498246 2265 sql/sqlliveness/slinstance/slinstance.go:185  [n4] exiting heartbeat loop
I200812 15:15:20.508544 3788 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n2: failed to connect to n2 at 127.0.0.1:45487: context canceled
W200812 15:15:20.510327 1138 gossip/gossip.go:1494  [n2] no incoming or outgoing connections
W200812 15:15:20.511596 1983 gossip/gossip.go:1494  [n4] no incoming or outgoing connections
I200812 15:15:20.513211 3789 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n4: failed to connect to n4 at 127.0.0.1:39127: context canceled
I200812 15:15:20.517761 3790 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n1: failed to connect to n1 at 127.0.0.1:44241: context canceled
W200812 15:15:20.555898 1814 jobs/registry.go:669  canceling all adopted jobs due to stopper quiescing
--- FAIL: TestFollowerReadsWithStaleDescriptor (9.69s)
    followerreads_test.go:291: 
        	Error Trace:	followerreads_test.go:291
        	Error:      	Should be false
        	Test:       	TestFollowerReadsWithStaleDescriptor
        	Messages:   	query was not served through follower reads:  9223372036854.777ms      0.000ms    === operation:SELECT * FROM test AS OF SYSTEM TIME follower_read_timestamp() WHERE k=2 client:127.0.0.1:49154 hostssl: node:4 sb:1 unfinished: user:root
        	            	 9223372036854.777ms      1.299ms    event:sql/plan_opt.go:361 [n4,client=127.0.0.1:49154,hostssl,user=root] query cache miss
        	            	 9223372036854.777ms      0.602ms    event:sql/catalog/descs/collection.go:478 [n4,client=127.0.0.1:49154,hostssl,user=root] added descriptor 't.public.test' to collection
        	            	 9223372036854.777ms      6.055ms    event:sql/plan_opt.go:361 [n4,client=127.0.0.1:49154,hostssl,user=root] query cache add
        	            	 9223372036854.777ms      0.313ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:49154,hostssl,user=root] planning ends
        	            	 9223372036854.777ms      0.283ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:49154,hostssl,user=root] checking distributability
        	            	 9223372036854.777ms      0.138ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:49154,hostssl,user=root] will distribute plan: false
        	            	 9223372036854.777ms      0.114ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:49154,hostssl,user=root] execution starts: distributed engine
        	            	 9223372036854.777ms      0.323ms    event:sql/distsql_running.go:981 [n4,client=127.0.0.1:49154,hostssl,user=root] creating DistSQL plan with isLocal=true
        	            	 9223372036854.777ms      0.305ms    event:sql/distsql_running.go:381 [n4,client=127.0.0.1:49154,hostssl,user=root] running DistSQL plan
        	            	 9223372036854.777ms      0.108ms        === operation:flow client:127.0.0.1:49154 hostssl: node:4 user:root
        	            	 9223372036854.777ms      0.310ms        event:sql/flowinfra/flow.go:285 [n4,client=127.0.0.1:49154,hostssl,user=root] starting (0 processors, 0 startables)
        	            	 9223372036854.777ms      0.008ms        === operation:table reader client:127.0.0.1:49154 cockroach.flowid:00000000-0000-0000-0000-000000000000 cockroach.processorid:0 cockroach.stat.tablereader.bytes.read:0 B cockroach.stat.tablereader.input.rows:0 cockroach.stat.tablereader.stalltime:23.015ms hostssl: node:4 user:root
        	            	 9223372036854.777ms      0.161ms        event:sql/rowexec/tablereader.go:181 [n4,client=127.0.0.1:49154,hostssl,user=root] starting scan with limitBatches true
        	            	 9223372036854.777ms      0.161ms        event:sql/row/kv_batch_fetcher.go:289 [n4,client=127.0.0.1:49154,hostssl,user=root] Scan /Table/53/1/2{-/#}
        	            	 9223372036854.777ms      0.053ms            === operation:txn coordinator send client:127.0.0.1:49154 hostssl: node:4 txnID:0f83894b-90e2-4862-a8bd-0209f0d82919 user:root
        	            	 9223372036854.777ms      0.132ms                === operation:dist sender send client:127.0.0.1:49154 hostssl: node:4 txn:0f83894b user:root
        	            	 9223372036854.777ms      0.357ms                event:kv/kvclient/kvcoord/range_iter.go:159 [n4,client=127.0.0.1:49154,hostssl,user=root,txn=0f83894b] querying next range at /Table/53/1/2
        	            	 9223372036854.777ms      0.726ms                event:kv/kvclient/kvcoord/dist_sender.go:1790 [n4,client=127.0.0.1:49154,hostssl,user=root,txn=0f83894b] r35: sending batch 1 Scan to (n3,s3):3
        	            	 9223372036854.777ms      0.272ms                event:rpc/nodedialer/nodedialer.go:132 [n4,client=127.0.0.1:49154,hostssl,user=root,txn=0f83894b] sending request to 127.0.0.1:38155
        	            	 9223372036854.777ms      0.154ms                    === operation:/cockroach.roachpb.Internal/Batch component:gRPC node:4 sb:1 span.kind:client txnID:0f83894b-90e2-4862-a8bd-0209f0d82919
        	            	 9223372036854.777ms      4.623ms                    === operation:/cockroach.roachpb.Internal/Batch component:gRPC node:3 sb:1 span.kind:server txnid:0f83894b-90e2-4862-a8bd-0209f0d82919 unfinished:
        	            	 9223372036854.777ms      0.297ms                    event:server/node.go:842 [n3] node received request: 1 Scan
        	            	 9223372036854.777ms      0.286ms                    event:kv/kvserver/store_send.go:161 [n3,s3] executing [txn: 0f83894b], Scan [/Table/53/1/2,/Table/53/1/2/#)
        	            	 9223372036854.777ms      0.421ms                    event:kv/kvserver/replica_send.go:97 [n3,s3,r35/3:/{Table/39-Max}] read-only path
        	            	 9223372036854.777ms      0.742ms                    event:kv/kvserver/replica_follower_read.go:105 [n3,s3,r35/3:/{Table/39-Max}] serving via follower read; query timestamp below closed timestamp by 3.813945392s
        	            	 9223372036854.777ms      0.311ms                    event:kv/kvserver/concurrency/concurrency_manager.go:116 [n3,s3,r35/3:/{Table/39-Max}] sequencing request
        	            	 9223372036854.777ms      0.170ms                    event:kv/kvserver/concurrency/concurrency_manager.go:152 [n3,s3,r35/3:/{Table/39-Max}] acquiring latches
        	            	 9223372036854.777ms      5.417ms                    event:kv/kvserver/concurrency/concurrency_manager.go:164 [n3,s3,r35/3:/{Table/39-Max}] scanning lock table for conflicting locks
        	            	 9223372036854.777ms      4.046ms                    event:kv/kvserver/replica_read.go:124 [n3,s3,r35/3:/{Table/39-Max}] executing read-only batch
        	            	 9223372036854.777ms      1.771ms                    event:kv/kvserver/replica_read.go:109 [n3,s3,r35/3:/{Table/39-Max}] read completed
        	            	 9223372036854.777ms     22.696ms            event:kv/kvclient/kvcoord/txn_interceptor_span_refresher.go:425 [n4,client=127.0.0.1:49154,hostssl,user=root,txn=0f83894b] recording span to refresh: /Table/53/1/2{-/#}
        	            	 9223372036854.777ms     24.675ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:49154,hostssl,user=root] execution ends
        	            	 9223372036854.777ms      0.115ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:49154,hostssl,user=root] rows affected: 0
        	            	 9223372036854.777ms      0.509ms    event:sql/conn_executor_exec.go:1392 [n4,client=127.0.0.1:49154,hostssl,user=root] AutoCommit. err: <nil>

More

Parameters:

  • GOFLAGS=-json
make stressrace TESTS=TestFollowerReadsWithStaleDescriptor PKG=./pkg/ccl/kvccl/kvfollowerreadsccl TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(ccl/kvccl/kvfollowerreadsccl).TestFollowerReadsWithStaleDescriptor failed on master@239805ade7d2e2e2628266fefc818ac60dcf74dc:

I200812 16:38:31.829810 3679 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n3: failed to connect to n3 at 127.0.0.1:37145: context canceled
I200812 16:38:31.831704 3678 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n2: failed to connect to n2 at 127.0.0.1:36073: context canceled
I200812 16:38:31.834332 294 kv/kvclient/kvcoord/transport_race.go:108  transport race promotion: ran 62 iterations on up to 1543 requests
I200812 16:38:31.847712 3751 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n4: failed to connect to n4 at 127.0.0.1:35755: context canceled
I200812 16:38:31.852511 3752 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n2: failed to connect to n2 at 127.0.0.1:36073: context canceled
--- FAIL: TestFollowerReadsWithStaleDescriptor (9.52s)
    followerreads_test.go:291: 
        	Error Trace:	followerreads_test.go:291
        	Error:      	Should be false
        	Test:       	TestFollowerReadsWithStaleDescriptor
        	Messages:   	query was not served through follower reads:  9223372036854.777ms      0.000ms    === operation:SELECT * FROM test AS OF SYSTEM TIME follower_read_timestamp() WHERE k=2 client:127.0.0.1:54156 hostssl: node:4 sb:1 unfinished: user:root
        	            	 9223372036854.777ms      0.668ms    event:sql/plan_opt.go:361 [n4,client=127.0.0.1:54156,hostssl,user=root] query cache miss
        	            	 9223372036854.777ms      0.589ms    event:sql/catalog/descs/collection.go:478 [n4,client=127.0.0.1:54156,hostssl,user=root] added descriptor 't.public.test' to collection
        	            	 9223372036854.777ms      1.294ms    event:sql/plan_opt.go:361 [n4,client=127.0.0.1:54156,hostssl,user=root] query cache add
        	            	 9223372036854.777ms      0.313ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:54156,hostssl,user=root] planning ends
        	            	 9223372036854.777ms      0.155ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:54156,hostssl,user=root] checking distributability
        	            	 9223372036854.777ms      0.132ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:54156,hostssl,user=root] will distribute plan: false
        	            	 9223372036854.777ms      0.127ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:54156,hostssl,user=root] execution starts: distributed engine
        	            	 9223372036854.777ms      0.121ms    event:sql/distsql_running.go:981 [n4,client=127.0.0.1:54156,hostssl,user=root] creating DistSQL plan with isLocal=true
        	            	 9223372036854.777ms      0.211ms    event:sql/distsql_running.go:381 [n4,client=127.0.0.1:54156,hostssl,user=root] running DistSQL plan
        	            	 9223372036854.777ms      0.071ms        === operation:flow client:127.0.0.1:54156 hostssl: node:4 user:root
        	            	 9223372036854.777ms      0.520ms        event:sql/flowinfra/flow.go:285 [n4,client=127.0.0.1:54156,hostssl,user=root] starting (0 processors, 0 startables)
        	            	 9223372036854.777ms      0.008ms        === operation:table reader client:127.0.0.1:54156 cockroach.flowid:00000000-0000-0000-0000-000000000000 cockroach.processorid:0 cockroach.stat.tablereader.bytes.read:0 B cockroach.stat.tablereader.input.rows:0 cockroach.stat.tablereader.stalltime:16.729ms hostssl: node:4 user:root
        	            	 9223372036854.777ms      0.358ms        event:sql/rowexec/tablereader.go:181 [n4,client=127.0.0.1:54156,hostssl,user=root] starting scan with limitBatches true
        	            	 9223372036854.777ms      0.233ms        event:sql/row/kv_batch_fetcher.go:289 [n4,client=127.0.0.1:54156,hostssl,user=root] Scan /Table/53/1/2{-/#}
        	            	 9223372036854.777ms      0.044ms            === operation:txn coordinator send client:127.0.0.1:54156 hostssl: node:4 txnID:9bfd101f-8f43-4838-a2d0-3ef700dd4240 user:root
        	            	 9223372036854.777ms      0.249ms                === operation:dist sender send client:127.0.0.1:54156 hostssl: node:4 txn:9bfd101f user:root
        	            	 9223372036854.777ms      0.412ms                event:kv/kvclient/kvcoord/range_iter.go:159 [n4,client=127.0.0.1:54156,hostssl,user=root,txn=9bfd101f] querying next range at /Table/53/1/2
        	            	 9223372036854.777ms      0.792ms                event:kv/kvclient/kvcoord/dist_sender.go:1790 [n4,client=127.0.0.1:54156,hostssl,user=root,txn=9bfd101f] r35: sending batch 1 Scan to (n3,s3):3
        	            	 9223372036854.777ms      0.362ms                event:rpc/nodedialer/nodedialer.go:132 [n4,client=127.0.0.1:54156,hostssl,user=root,txn=9bfd101f] sending request to 127.0.0.1:37145
        	            	 9223372036854.777ms      0.214ms                    === operation:/cockroach.roachpb.Internal/Batch component:gRPC node:4 sb:1 span.kind:client txnID:9bfd101f-8f43-4838-a2d0-3ef700dd4240
        	            	 9223372036854.777ms      2.527ms                    === operation:/cockroach.roachpb.Internal/Batch component:gRPC node:3 sb:1 span.kind:server txnid:9bfd101f-8f43-4838-a2d0-3ef700dd4240 unfinished:
        	            	 9223372036854.777ms      0.312ms                    event:server/node.go:842 [n3] node received request: 1 Scan
        	            	 9223372036854.777ms      0.319ms                    event:kv/kvserver/store_send.go:161 [n3,s3] executing [txn: 9bfd101f], Scan [/Table/53/1/2,/Table/53/1/2/#)
        	            	 9223372036854.777ms      0.354ms                    event:kv/kvserver/replica_send.go:97 [n3,s3,r35/3:/{Table/39-Max}] read-only path
        	            	 9223372036854.777ms      0.898ms                    event:kv/kvserver/replica_follower_read.go:105 [n3,s3,r35/3:/{Table/39-Max}] serving via follower read; query timestamp below closed timestamp by 3.658611108s
        	            	 9223372036854.777ms      0.227ms                    event:kv/kvserver/concurrency/concurrency_manager.go:116 [n3,s3,r35/3:/{Table/39-Max}] sequencing request
        	            	 9223372036854.777ms      0.109ms                    event:kv/kvserver/concurrency/concurrency_manager.go:152 [n3,s3,r35/3:/{Table/39-Max}] acquiring latches
        	            	 9223372036854.777ms      0.199ms                    event:kv/kvserver/concurrency/concurrency_manager.go:164 [n3,s3,r35/3:/{Table/39-Max}] scanning lock table for conflicting locks
        	            	 9223372036854.777ms      0.396ms                    event:kv/kvserver/replica_read.go:124 [n3,s3,r35/3:/{Table/39-Max}] executing read-only batch
        	            	 9223372036854.777ms      3.726ms                    event:kv/kvserver/replica_read.go:109 [n3,s3,r35/3:/{Table/39-Max}] read completed
        	            	 9223372036854.777ms      0.204ms                    event:kv/kvserver/replica_send.go:166 [n3,s3,r35/3:/{Table/39-Max}] client had stale range info; returning an update
        	            	 9223372036854.777ms     12.708ms                event:kv/kvclient/kvcoord/dist_sender.go:1904 [n4,client=127.0.0.1:54156,hostssl,user=root,txn=9bfd101f] received updated range info: [desc: r35:/{Table/39-Max} [(n1,s1):1, (n3,s3):3, next=4, gen=6] lease: repl=(n1,s1):1 seq=1 start=0,0 epo=1 pro=1597250302.857898035,0]
        	            	 9223372036854.777ms      0.820ms                event:kv/kvclient/kvcoord/range_cache.go:850 [n4,client=127.0.0.1:54156,hostssl,user=root,txn=9bfd101f] evict cached descriptor: desc=desc:r35:/{Table/39-Max} [(n1,s1):1, (n2,s2):2LEARNER, (n3,s3):3, next=4, gen=5], lease:repl=(n1,s1):1 seq=1 start=0,0 epo=1 pro=1597250302.857898035,0
        	            	 9223372036854.777ms      0.135ms                event:kv/kvclient/kvcoord/range_cache.go:481 [n4,client=127.0.0.1:54156,hostssl,user=root,txn=9bfd101f] evicting cached range descriptor with 1 replacements
        	            	 9223372036854.777ms      0.482ms                event:kv/kvclient/kvcoord/range_cache.go:1026 [n4,client=127.0.0.1:54156,hostssl,user=root,txn=9bfd101f] clearing entries overlapping r35:/{Table/39-Max} [(n1,s1):1, (n3,s3):3, next=4, gen=6]
        	            	 9223372036854.777ms     16.346ms            event:kv/kvclient/kvcoord/txn_interceptor_span_refresher.go:425 [n4,client=127.0.0.1:54156,hostssl,user=root,txn=9bfd101f] recording span to refresh: /Table/53/1/2{-/#}
        	            	 9223372036854.777ms     18.899ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:54156,hostssl,user=root] execution ends
        	            	 9223372036854.777ms      0.116ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:54156,hostssl,user=root] rows affected: 0
        	            	 9223372036854.777ms      0.341ms    event:sql/conn_executor_exec.go:1392 [n4,client=127.0.0.1:54156,hostssl,user=root] AutoCommit. err: <nil>

More

Parameters:

  • GOFLAGS=-json
make stressrace TESTS=TestFollowerReadsWithStaleDescriptor PKG=./pkg/ccl/kvccl/kvfollowerreadsccl TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(ccl/kvccl/kvfollowerreadsccl).TestFollowerReadsWithStaleDescriptor failed on master@be8c1a17b940ecc56dc3521ab8264f662c88072c:

W200812 17:12:26.537163 2905 kv/kvserver/raft_transport.go:637  [n1] while processing outgoing Raft queue to node 2: rpc error: code = Canceled desc = grpc: the client connection is closing:
W200812 17:12:26.543403 1169 gossip/gossip.go:1494  [n2] no incoming or outgoing connections
I200812 17:12:26.544245 3741 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n2: failed to connect to n2 at 127.0.0.1:46821: context canceled
W200812 17:12:26.545224 2741 kv/kvserver/raft_transport.go:637  [n2] while processing outgoing Raft queue to node 1: EOF:
I200812 17:12:26.547217 3743 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n4: failed to connect to n4 at 127.0.0.1:43181: context canceled
W200812 17:12:26.548577 1327 sql/sqlliveness/slinstance/slinstance.go:185  [n2] exiting heartbeat loop
W200812 17:12:26.548534 1555 gossip/gossip.go:1494  [n3] no incoming or outgoing connections
I200812 17:12:26.549082 3742 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n1: failed to connect to n1 at 127.0.0.1:45753: context canceled
W200812 17:12:26.548817 3565 kv/kvserver/raft_transport.go:637  [n3] while processing outgoing Raft queue to node 1: rpc error: code = Canceled desc = grpc: the client connection is closing:
W200812 17:12:26.551731 2730 kv/kvserver/raft_transport.go:637  [n2] while processing outgoing Raft queue to node 1: EOF:
--- FAIL: TestFollowerReadsWithStaleDescriptor (9.07s)
    followerreads_test.go:291: 
        	Error Trace:	followerreads_test.go:291
        	Error:      	Should be false
        	Test:       	TestFollowerReadsWithStaleDescriptor
        	Messages:   	query was not served through follower reads:  9223372036854.777ms      0.000ms    === operation:SELECT * FROM test AS OF SYSTEM TIME follower_read_timestamp() WHERE k=2 client:127.0.0.1:59134 hostssl: node:4 sb:1 unfinished: user:root
        	            	 9223372036854.777ms      0.685ms    event:sql/plan_opt.go:361 [n4,client=127.0.0.1:59134,hostssl,user=root] query cache miss
        	            	 9223372036854.777ms      2.464ms    event:sql/catalog/descs/collection.go:478 [n4,client=127.0.0.1:59134,hostssl,user=root] added descriptor 't.public.test' to collection
        	            	 9223372036854.777ms      3.041ms    event:sql/plan_opt.go:361 [n4,client=127.0.0.1:59134,hostssl,user=root] query cache add
        	            	 9223372036854.777ms      0.285ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:59134,hostssl,user=root] planning ends
        	            	 9223372036854.777ms      0.134ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:59134,hostssl,user=root] checking distributability
        	            	 9223372036854.777ms      0.137ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:59134,hostssl,user=root] will distribute plan: false
        	            	 9223372036854.777ms      0.125ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:59134,hostssl,user=root] execution starts: distributed engine
        	            	 9223372036854.777ms      0.170ms    event:sql/distsql_running.go:981 [n4,client=127.0.0.1:59134,hostssl,user=root] creating DistSQL plan with isLocal=true
        	            	 9223372036854.777ms      0.164ms    event:sql/distsql_running.go:381 [n4,client=127.0.0.1:59134,hostssl,user=root] running DistSQL plan
        	            	 9223372036854.777ms      0.083ms        === operation:flow client:127.0.0.1:59134 hostssl: node:4 user:root
        	            	 9223372036854.777ms      0.535ms        event:sql/flowinfra/flow.go:285 [n4,client=127.0.0.1:59134,hostssl,user=root] starting (0 processors, 0 startables)
        	            	 9223372036854.777ms      0.013ms        === operation:table reader client:127.0.0.1:59134 cockroach.flowid:00000000-0000-0000-0000-000000000000 cockroach.processorid:0 cockroach.stat.tablereader.bytes.read:0 B cockroach.stat.tablereader.input.rows:0 cockroach.stat.tablereader.stalltime:22.985ms hostssl: node:4 user:root
        	            	 9223372036854.777ms      0.144ms        event:sql/rowexec/tablereader.go:181 [n4,client=127.0.0.1:59134,hostssl,user=root] starting scan with limitBatches true
        	            	 9223372036854.777ms      0.302ms        event:sql/row/kv_batch_fetcher.go:289 [n4,client=127.0.0.1:59134,hostssl,user=root] Scan /Table/53/1/2{-/#}
        	            	 9223372036854.777ms      0.055ms            === operation:txn coordinator send client:127.0.0.1:59134 hostssl: node:4 txnID:188759d1-5cc8-461d-9a68-2833939bf5eb user:root
        	            	 9223372036854.777ms      0.110ms                === operation:dist sender send client:127.0.0.1:59134 hostssl: node:4 txn:188759d1 user:root
        	            	 9223372036854.777ms      0.308ms                event:kv/kvclient/kvcoord/range_iter.go:159 [n4,client=127.0.0.1:59134,hostssl,user=root,txn=188759d1] querying next range at /Table/53/1/2
        	            	 9223372036854.777ms      0.712ms                event:kv/kvclient/kvcoord/dist_sender.go:1790 [n4,client=127.0.0.1:59134,hostssl,user=root,txn=188759d1] r35: sending batch 1 Scan to (n3,s3):3
        	            	 9223372036854.777ms      0.177ms                event:rpc/nodedialer/nodedialer.go:132 [n4,client=127.0.0.1:59134,hostssl,user=root,txn=188759d1] sending request to 127.0.0.1:43349
        	            	 9223372036854.777ms      0.130ms                    === operation:/cockroach.roachpb.Internal/Batch component:gRPC node:4 sb:1 span.kind:client txnID:188759d1-5cc8-461d-9a68-2833939bf5eb
        	            	 9223372036854.777ms      6.459ms                    === operation:/cockroach.roachpb.Internal/Batch component:gRPC node:3 sb:1 span.kind:server txnid:188759d1-5cc8-461d-9a68-2833939bf5eb unfinished:
        	            	 9223372036854.777ms      0.300ms                    event:server/node.go:842 [n3] node received request: 1 Scan
        	            	 9223372036854.777ms      0.452ms                    event:kv/kvserver/store_send.go:161 [n3,s3] executing [txn: 188759d1], Scan [/Table/53/1/2,/Table/53/1/2/#)
        	            	 9223372036854.777ms      0.324ms                    event:kv/kvserver/replica_send.go:97 [n3,s3,r35/3:/{Table/39-Max}] read-only path
        	            	 9223372036854.777ms      1.036ms                    event:kv/kvserver/replica_follower_read.go:105 [n3,s3,r35/3:/{Table/39-Max}] serving via follower read; query timestamp below closed timestamp by 3.691310477s
        	            	 9223372036854.777ms      0.435ms                    event:kv/kvserver/concurrency/concurrency_manager.go:116 [n3,s3,r35/3:/{Table/39-Max}] sequencing request
        	            	 9223372036854.777ms      0.298ms                    event:kv/kvserver/concurrency/concurrency_manager.go:152 [n3,s3,r35/3:/{Table/39-Max}] acquiring latches
        	            	 9223372036854.777ms      0.279ms                    event:kv/kvserver/concurrency/concurrency_manager.go:164 [n3,s3,r35/3:/{Table/39-Max}] scanning lock table for conflicting locks
        	            	 9223372036854.777ms      0.290ms                    event:kv/kvserver/replica_read.go:124 [n3,s3,r35/3:/{Table/39-Max}] executing read-only batch
        	            	 9223372036854.777ms      3.188ms                    event:kv/kvserver/replica_read.go:109 [n3,s3,r35/3:/{Table/39-Max}] read completed
        	            	 9223372036854.777ms     22.491ms            event:kv/kvclient/kvcoord/txn_interceptor_span_refresher.go:425 [n4,client=127.0.0.1:59134,hostssl,user=root,txn=188759d1] recording span to refresh: /Table/53/1/2{-/#}
        	            	 9223372036854.777ms     25.084ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:59134,hostssl,user=root] execution ends
        	            	 9223372036854.777ms      0.098ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:59134,hostssl,user=root] rows affected: 0
        	            	 9223372036854.777ms      0.479ms    event:sql/conn_executor_exec.go:1392 [n4,client=127.0.0.1:59134,hostssl,user=root] AutoCommit. err: <nil>

More

Parameters:

  • GOFLAGS=-json
make stressrace TESTS=TestFollowerReadsWithStaleDescriptor PKG=./pkg/ccl/kvccl/kvfollowerreadsccl TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

See this test on roachdash
powered by pkg/cmd/internal/issues

craig bot pushed a commit that referenced this issue Aug 12, 2020
52688: *: skip a couple of flakey tests r=irfansharif a=irfansharif

- TestFollowerReadsWithStaleDescriptor (#52681)
- TestDistSQLRangeCachesIntegrationTest (#52682)
- acceptance/gossip/restart (#48423)

Release note: None

---

+cc @andreimatei, @asubiotto who are the current assignees on those tests.

Co-authored-by: irfan sharif <[email protected]>
@cockroach-teamcity
Copy link
Member Author

(ccl/kvccl/kvfollowerreadsccl).TestFollowerReadsWithStaleDescriptor failed on master@89208ec8b5350fd34036b559a258403e820deda6:

I200812 17:53:31.092529 1364 kv/kvserver/queue.go:582  [n2,s2] rate limited in MaybeAdd (consistencyChecker): node unavailable; try another peer
I200812 17:53:31.092671 1364 kv/kvserver/queue.go:582  [n2,s2] rate limited in MaybeAdd (timeSeriesMaintenance): node unavailable; try another peer
I200812 17:53:31.105720 30 kv/kvclient/kvcoord/transport_race.go:108  transport race promotion: ran 46 iterations on up to 1520 requests
W200812 17:53:31.106825 235 kv/kvserver/store.go:1665  [n1,s1,r1/1:/{Min-System/NodeL…}] could not gossip first range descriptor: node unavailable; try another peer
I200812 17:53:31.131617 3745 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n2: failed to connect to n2 at 127.0.0.1:35903: context canceled
--- FAIL: TestFollowerReadsWithStaleDescriptor (9.48s)
    followerreads_test.go:291: 
        	Error Trace:	followerreads_test.go:291
        	Error:      	Should be false
        	Test:       	TestFollowerReadsWithStaleDescriptor
        	Messages:   	query was not served through follower reads:  9223372036854.777ms      0.000ms    === operation:SELECT * FROM test AS OF SYSTEM TIME follower_read_timestamp() WHERE k=2 client:127.0.0.1:47904 hostssl: node:4 sb:1 unfinished: user:root
        	            	 9223372036854.777ms      0.552ms    event:sql/plan_opt.go:361 [n4,client=127.0.0.1:47904,hostssl,user=root] query cache miss
        	            	 9223372036854.777ms      0.697ms    event:sql/catalog/descs/collection.go:478 [n4,client=127.0.0.1:47904,hostssl,user=root] added descriptor 't.public.test' to collection
        	            	 9223372036854.777ms      1.055ms    event:sql/plan_opt.go:361 [n4,client=127.0.0.1:47904,hostssl,user=root] query cache add
        	            	 9223372036854.777ms      0.210ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:47904,hostssl,user=root] planning ends
        	            	 9223372036854.777ms      0.162ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:47904,hostssl,user=root] checking distributability
        	            	 9223372036854.777ms      0.099ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:47904,hostssl,user=root] will distribute plan: false
        	            	 9223372036854.777ms      0.136ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:47904,hostssl,user=root] execution starts: distributed engine
        	            	 9223372036854.777ms      0.121ms    event:sql/distsql_running.go:981 [n4,client=127.0.0.1:47904,hostssl,user=root] creating DistSQL plan with isLocal=true
        	            	 9223372036854.777ms      0.128ms    event:sql/distsql_running.go:381 [n4,client=127.0.0.1:47904,hostssl,user=root] running DistSQL plan
        	            	 9223372036854.777ms      0.085ms        === operation:flow client:127.0.0.1:47904 hostssl: node:4 user:root
        	            	 9223372036854.777ms      0.233ms        event:sql/flowinfra/flow.go:285 [n4,client=127.0.0.1:47904,hostssl,user=root] starting (0 processors, 0 startables)
        	            	 9223372036854.777ms      0.007ms        === operation:table reader client:127.0.0.1:47904 cockroach.flowid:00000000-0000-0000-0000-000000000000 cockroach.processorid:0 cockroach.stat.tablereader.bytes.read:0 B cockroach.stat.tablereader.input.rows:0 cockroach.stat.tablereader.stalltime:15.473ms hostssl: node:4 user:root
        	            	 9223372036854.777ms      0.285ms        event:sql/rowexec/tablereader.go:181 [n4,client=127.0.0.1:47904,hostssl,user=root] starting scan with limitBatches true
        	            	 9223372036854.777ms      0.152ms        event:sql/row/kv_batch_fetcher.go:289 [n4,client=127.0.0.1:47904,hostssl,user=root] Scan /Table/53/1/2{-/#}
        	            	 9223372036854.777ms      0.052ms            === operation:txn coordinator send client:127.0.0.1:47904 hostssl: node:4 txnID:e2f60e50-a2bb-4c51-a6f2-b7031f894e09 user:root
        	            	 9223372036854.777ms      0.223ms                === operation:dist sender send client:127.0.0.1:47904 hostssl: node:4 txn:e2f60e50 user:root
        	            	 9223372036854.777ms      0.333ms                event:kv/kvclient/kvcoord/range_iter.go:159 [n4,client=127.0.0.1:47904,hostssl,user=root,txn=e2f60e50] querying next range at /Table/53/1/2
        	            	 9223372036854.777ms      0.593ms                event:kv/kvclient/kvcoord/dist_sender.go:1790 [n4,client=127.0.0.1:47904,hostssl,user=root,txn=e2f60e50] r35: sending batch 1 Scan to (n3,s3):3
        	            	 9223372036854.777ms      0.168ms                event:rpc/nodedialer/nodedialer.go:132 [n4,client=127.0.0.1:47904,hostssl,user=root,txn=e2f60e50] sending request to 127.0.0.1:44685
        	            	 9223372036854.777ms      0.137ms                    === operation:/cockroach.roachpb.Internal/Batch component:gRPC node:4 sb:1 span.kind:client txnID:e2f60e50-a2bb-4c51-a6f2-b7031f894e09
        	            	 9223372036854.777ms      2.617ms                    === operation:/cockroach.roachpb.Internal/Batch component:gRPC node:3 sb:1 span.kind:server txnid:e2f60e50-a2bb-4c51-a6f2-b7031f894e09 unfinished:
        	            	 9223372036854.777ms      0.367ms                    event:server/node.go:842 [n3] node received request: 1 Scan
        	            	 9223372036854.777ms      0.261ms                    event:kv/kvserver/store_send.go:161 [n3,s3] executing [txn: e2f60e50], Scan [/Table/53/1/2,/Table/53/1/2/#)
        	            	 9223372036854.777ms      0.524ms                    event:kv/kvserver/replica_send.go:97 [n3,s3,r35/3:/{Table/39-Max}] read-only path
        	            	 9223372036854.777ms      0.887ms                    event:kv/kvserver/replica_follower_read.go:105 [n3,s3,r35/3:/{Table/39-Max}] serving via follower read; query timestamp below closed timestamp by 3.658067176s
        	            	 9223372036854.777ms      0.209ms                    event:kv/kvserver/concurrency/concurrency_manager.go:116 [n3,s3,r35/3:/{Table/39-Max}] sequencing request
        	            	 9223372036854.777ms      0.266ms                    event:kv/kvserver/concurrency/concurrency_manager.go:152 [n3,s3,r35/3:/{Table/39-Max}] acquiring latches
        	            	 9223372036854.777ms      0.169ms                    event:kv/kvserver/concurrency/concurrency_manager.go:164 [n3,s3,r35/3:/{Table/39-Max}] scanning lock table for conflicting locks
        	            	 9223372036854.777ms      0.492ms                    event:kv/kvserver/replica_read.go:124 [n3,s3,r35/3:/{Table/39-Max}] executing read-only batch
        	            	 9223372036854.777ms      1.962ms                    event:kv/kvserver/replica_read.go:109 [n3,s3,r35/3:/{Table/39-Max}] read completed
        	            	 9223372036854.777ms      0.339ms                    event:kv/kvserver/replica_send.go:166 [n3,s3,r35/3:/{Table/39-Max}] client had stale range info; returning an update
        	            	 9223372036854.777ms     12.210ms                event:kv/kvclient/kvcoord/dist_sender.go:1904 [n4,client=127.0.0.1:47904,hostssl,user=root,txn=e2f60e50] received updated range info: [desc: r35:/{Table/39-Max} [(n1,s1):1, (n3,s3):3, next=4, gen=6] lease: repl=(n1,s1):1 seq=1 start=0,0 epo=1 pro=1597254802.198838457,0]
        	            	 9223372036854.777ms      0.675ms                event:kv/kvclient/kvcoord/range_cache.go:850 [n4,client=127.0.0.1:47904,hostssl,user=root,txn=e2f60e50] evict cached descriptor: desc=desc:r35:/{Table/39-Max} [(n1,s1):1, (n2,s2):2LEARNER, (n3,s3):3, next=4, gen=5], lease:repl=(n1,s1):1 seq=1 start=0,0 epo=1 pro=1597254802.198838457,0
        	            	 9223372036854.777ms      0.109ms                event:kv/kvclient/kvcoord/range_cache.go:481 [n4,client=127.0.0.1:47904,hostssl,user=root,txn=e2f60e50] evicting cached range descriptor with 1 replacements
        	            	 9223372036854.777ms      0.498ms                event:kv/kvclient/kvcoord/range_cache.go:1026 [n4,client=127.0.0.1:47904,hostssl,user=root,txn=e2f60e50] clearing entries overlapping r35:/{Table/39-Max} [(n1,s1):1, (n3,s3):3, next=4, gen=6]
        	            	 9223372036854.777ms     15.175ms            event:kv/kvclient/kvcoord/txn_interceptor_span_refresher.go:425 [n4,client=127.0.0.1:47904,hostssl,user=root,txn=e2f60e50] recording span to refresh: /Table/53/1/2{-/#}
        	            	 9223372036854.777ms     18.417ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:47904,hostssl,user=root] execution ends
        	            	 9223372036854.777ms      0.130ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:47904,hostssl,user=root] rows affected: 0
        	            	 9223372036854.777ms      0.479ms    event:sql/conn_executor_exec.go:1392 [n4,client=127.0.0.1:47904,hostssl,user=root] AutoCommit. err: <nil>

More

Parameters:

  • GOFLAGS=-json
make stressrace TESTS=TestFollowerReadsWithStaleDescriptor PKG=./pkg/ccl/kvccl/kvfollowerreadsccl TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(ccl/kvccl/kvfollowerreadsccl).TestFollowerReadsWithStaleDescriptor failed on master@52b8eb1b1c90d1ff9c8faae8c80d45b7de56d350:

W200812 19:00:48.590550 3724 kv/kvserver/raft_transport.go:637  [n1] while processing outgoing Raft queue to node 3: rpc error: code = Canceled desc = grpc: the client connection is closing:
W200812 19:00:48.591766 1987 gossip/gossip.go:1494  [n4] no incoming or outgoing connections
I200812 19:00:48.592826 139 kv/kvclient/kvcoord/transport_race.go:108  transport race promotion: ran 66 iterations on up to 1533 requests
W200812 19:00:48.590716 1563 gossip/gossip.go:1494  [n3] no incoming or outgoing connections
W200812 19:00:48.593794 3696 kv/kvserver/raft_transport.go:637  [n3] while processing outgoing Raft queue to node 1: rpc error: code = Canceled desc = grpc: the client connection is closing:
I200812 19:00:48.599693 3831 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n2: failed to connect to n2 at 127.0.0.1:36677: context canceled
I200812 19:00:48.599735 3830 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n1: failed to connect to n1 at 127.0.0.1:43647: context canceled
I200812 19:00:48.616235 3813 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n3: failed to connect to n3 at 127.0.0.1:43327: context canceled
I200812 19:00:48.616998 3814 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n4: failed to connect to n4 at 127.0.0.1:43753: context canceled
I200812 19:00:48.617858 3812 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n1: failed to connect to n1 at 127.0.0.1:43647: context canceled
--- FAIL: TestFollowerReadsWithStaleDescriptor (9.42s)
    followerreads_test.go:291: 
        	Error Trace:	followerreads_test.go:291
        	Error:      	Should be false
        	Test:       	TestFollowerReadsWithStaleDescriptor
        	Messages:   	query was not served through follower reads:  9223372036854.777ms      0.000ms    === operation:SELECT * FROM test AS OF SYSTEM TIME follower_read_timestamp() WHERE k=2 client:127.0.0.1:33130 hostssl: node:4 sb:1 unfinished: user:root
        	            	 9223372036854.777ms      0.412ms    event:sql/plan_opt.go:361 [n4,client=127.0.0.1:33130,hostssl,user=root] query cache miss
        	            	 9223372036854.777ms      0.290ms    event:sql/catalog/descs/collection.go:478 [n4,client=127.0.0.1:33130,hostssl,user=root] added descriptor 't.public.test' to collection
        	            	 9223372036854.777ms      0.860ms    event:sql/plan_opt.go:361 [n4,client=127.0.0.1:33130,hostssl,user=root] query cache add
        	            	 9223372036854.777ms      0.237ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:33130,hostssl,user=root] planning ends
        	            	 9223372036854.777ms      0.134ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:33130,hostssl,user=root] checking distributability
        	            	 9223372036854.777ms      0.112ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:33130,hostssl,user=root] will distribute plan: false
        	            	 9223372036854.777ms      0.177ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:33130,hostssl,user=root] execution starts: distributed engine
        	            	 9223372036854.777ms      0.314ms    event:sql/distsql_running.go:981 [n4,client=127.0.0.1:33130,hostssl,user=root] creating DistSQL plan with isLocal=true
        	            	 9223372036854.777ms      0.192ms    event:sql/distsql_running.go:381 [n4,client=127.0.0.1:33130,hostssl,user=root] running DistSQL plan
        	            	 9223372036854.777ms      0.082ms        === operation:flow client:127.0.0.1:33130 hostssl: node:4 user:root
        	            	 9223372036854.777ms      0.269ms        event:sql/flowinfra/flow.go:285 [n4,client=127.0.0.1:33130,hostssl,user=root] starting (0 processors, 0 startables)
        	            	 9223372036854.777ms      0.008ms        === operation:table reader client:127.0.0.1:33130 cockroach.flowid:00000000-0000-0000-0000-000000000000 cockroach.processorid:0 cockroach.stat.tablereader.bytes.read:0 B cockroach.stat.tablereader.input.rows:0 cockroach.stat.tablereader.stalltime:18.722ms hostssl: node:4 user:root
        	            	 9223372036854.777ms      0.248ms        event:sql/rowexec/tablereader.go:181 [n4,client=127.0.0.1:33130,hostssl,user=root] starting scan with limitBatches true
        	            	 9223372036854.777ms      0.254ms        event:sql/row/kv_batch_fetcher.go:289 [n4,client=127.0.0.1:33130,hostssl,user=root] Scan /Table/53/1/2{-/#}
        	            	 9223372036854.777ms      0.039ms            === operation:txn coordinator send client:127.0.0.1:33130 hostssl: node:4 txnID:e24d1431-7d1c-4c21-b88f-af0b59227d8f user:root
        	            	 9223372036854.777ms      0.133ms                === operation:dist sender send client:127.0.0.1:33130 hostssl: node:4 txn:e24d1431 user:root
        	            	 9223372036854.777ms      0.399ms                event:kv/kvclient/kvcoord/range_iter.go:159 [n4,client=127.0.0.1:33130,hostssl,user=root,txn=e24d1431] querying next range at /Table/53/1/2
        	            	 9223372036854.777ms      0.689ms                event:kv/kvclient/kvcoord/dist_sender.go:1790 [n4,client=127.0.0.1:33130,hostssl,user=root,txn=e24d1431] r35: sending batch 1 Scan to (n3,s3):3
        	            	 9223372036854.777ms      0.163ms                event:rpc/nodedialer/nodedialer.go:132 [n4,client=127.0.0.1:33130,hostssl,user=root,txn=e24d1431] sending request to 127.0.0.1:43327
        	            	 9223372036854.777ms      0.162ms                    === operation:/cockroach.roachpb.Internal/Batch component:gRPC node:4 sb:1 span.kind:client txnID:e24d1431-7d1c-4c21-b88f-af0b59227d8f
        	            	 9223372036854.777ms      2.745ms                    === operation:/cockroach.roachpb.Internal/Batch component:gRPC node:3 sb:1 span.kind:server txnid:e24d1431-7d1c-4c21-b88f-af0b59227d8f unfinished:
        	            	 9223372036854.777ms      0.300ms                    event:server/node.go:842 [n3] node received request: 1 Scan
        	            	 9223372036854.777ms      0.283ms                    event:kv/kvserver/store_send.go:161 [n3,s3] executing [txn: e24d1431], Scan [/Table/53/1/2,/Table/53/1/2/#)
        	            	 9223372036854.777ms      0.507ms                    event:kv/kvserver/replica_send.go:97 [n3,s3,r35/3:/{Table/39-Max}] read-only path
        	            	 9223372036854.777ms      1.006ms                    event:kv/kvserver/replica_follower_read.go:105 [n3,s3,r35/3:/{Table/39-Max}] serving via follower read; query timestamp below closed timestamp by 3.738222972s
        	            	 9223372036854.777ms      0.325ms                    event:kv/kvserver/concurrency/concurrency_manager.go:116 [n3,s3,r35/3:/{Table/39-Max}] sequencing request
        	            	 9223372036854.777ms      0.142ms                    event:kv/kvserver/concurrency/concurrency_manager.go:152 [n3,s3,r35/3:/{Table/39-Max}] acquiring latches
        	            	 9223372036854.777ms      0.364ms                    event:kv/kvserver/concurrency/concurrency_manager.go:164 [n3,s3,r35/3:/{Table/39-Max}] scanning lock table for conflicting locks
        	            	 9223372036854.777ms      1.335ms                    event:kv/kvserver/replica_read.go:124 [n3,s3,r35/3:/{Table/39-Max}] executing read-only batch
        	            	 9223372036854.777ms      1.607ms                    event:kv/kvserver/replica_read.go:109 [n3,s3,r35/3:/{Table/39-Max}] read completed
        	            	 9223372036854.777ms     18.347ms            event:kv/kvclient/kvcoord/txn_interceptor_span_refresher.go:425 [n4,client=127.0.0.1:33130,hostssl,user=root,txn=e24d1431] recording span to refresh: /Table/53/1/2{-/#}
        	            	 9223372036854.777ms     21.468ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:33130,hostssl,user=root] execution ends
        	            	 9223372036854.777ms      0.295ms    event:sql/conn_executor_exec.go:571 [n4,client=127.0.0.1:33130,hostssl,user=root] rows affected: 0
        	            	 9223372036854.777ms      0.501ms    event:sql/conn_executor_exec.go:1392 [n4,client=127.0.0.1:33130,hostssl,user=root] AutoCommit. err: <nil>

More

Parameters:

  • GOFLAGS=-json
make stressrace TESTS=TestFollowerReadsWithStaleDescriptor PKG=./pkg/ccl/kvccl/kvfollowerreadsccl TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

See this test on roachdash
powered by pkg/cmd/internal/issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. skipped-test
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants