Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlstats: only include local region in statement_statistics #102192

Merged
merged 1 commit into from
Apr 26, 2023
Merged

sqlstats: only include local region in statement_statistics #102192

merged 1 commit into from
Apr 26, 2023

Conversation

matthewtodd
Copy link
Contributor

@matthewtodd matthewtodd commented Apr 24, 2023

Part of #89949.

Addresses #98020.
Addresses #99563.

Related to cockroachdb/roachperf#129.
Related to #102170.

Previously, we attempted to record all the regions hit in a single statement execution in the sqlstats tables, leaning on the sqlAddressResolver to map traced nodeIDs to localities at execution time.

While the sqlAddressResolver is generally non-blocking, the introduction of this code did cause some of the multiregion "this query shouldn't span regions" tests to start flaking and it's more recently been implicated in a 2.5% performance regression.

Given that the probabilistic nature of the tracing meant that we generally weren't capturing all the relevant nodeIDs anyway, it seems like the most prudent thing to do here is take a step back and regroup.

In the short term, let's stop even trying to gather all these regions. In the medium/long term, let's see if we can find a better approach.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@matthewtodd
Copy link
Contributor Author

I've re-enabled TestMultiRegionDataDriven/secondary_region here after confirming it passes on a gceworker under ./dev test --stress --stress-args '-p 4' pkg/ccl/multiregionccl/ -f=TestMultiRegionDataDriven/secondary_region --timeout=10m.

The other two skipped tests, regional_by_row and regional_by_table, are still failing under stress, but for reasons no longer related to #95449. (This commit reverts all the production code of that PR.)

./dev test --stress pkg/ccl/multiregionccl/ -f=TestMultiRegionDataDriven/regional_by_row --timeout=10m
--- FAIL: TestMultiRegionDataDriven (165.93s)
    test\_log\_scope.go:161: test logs captured to: /tmp/cockroach/\_tmp/e558fc8050776f4c54ea39ba371b49da/logTestMultiRegionDataDriven2067871072
    test\_log\_scope.go:79: use -show-logs to present logs inline
    --- FAIL: TestMultiRegionDataDriven/regional\_by\_row (165.91s)
        datadriven\_test.go:124: 
            /home/matthewtodd/.cache/bazel/\_bazel\_matthewtodd/d6c54e59b8d3a581c8d8f8fa25f4b98e/sandbox/linux-sandbox/4002/execroot/com\_github\_cockroachdb\_cockroach/bazel-out/k8-fastbuild/bin/pkg/ccl/multiregionccl/multiregionccl\_test\_/multiregionccl\_test.runfiles/com\_github\_cockroachdb\_cockroach/pkg/ccl/multiregionccl/testdata/regional\_by\_row:22:
             
            expected:
            
            found:
            expected node %!s(int=4) to have replica type voter but was not found
    datadriven\_test.go:428: -- test log scope end --
./dev test --stress pkg/ccl/multiregionccl/ -f=TestMultiRegionDataDriven/regional_by_table --timeout=10m
--- FAIL: TestMultiRegionDataDriven (64.97s)
    test\_log\_scope.go:161: test logs captured to: /tmp/cockroach/\_tmp/e558fc8050776f4c54ea39ba371b49da/logTestMultiRegionDataDriven2849108481
    test\_log\_scope.go:79: use -show-logs to present logs inline
    --- FAIL: TestMultiRegionDataDriven/regional\_by\_table (64.92s)
        datadriven\_test.go:124: 
            /home/matthewtodd/.cache/bazel/\_bazel\_matthewtodd/d6c54e59b8d3a581c8d8f8fa25f4b98e/sandbox/linux-sandbox/4003/execroot/com\_github\_cockroachdb\_cockroach/bazel-out/k8-fastbuild/bin/pkg/ccl/multiregionccl/multiregionccl\_test\_/multiregionccl\_test.runfiles/com\_github\_cockroachdb\_cockroach/pkg/ccl/multiregionccl/testdata/regional\_by\_table:226:
             SELECT * FROM db.rbt AS OF SYSTEM TIME follower\_read\_timestamp() WHERE k = 1
            expected:
            served locally: true
            served via follower read: true
            
            found:
            not a match, trace:
                 0.000ms      0.000ms    === operation:traced statement \_verbose:1 node:7 client:127.0.0.1:41838 user:root
                 0.000ms      0.000ms    [flow: {count: 1, duration 1ms}]
                 0.000ms      0.000ms    [colbatchscan: {count: 1, duration 1ms, unfinished}]
                 0.000ms      0.000ms    [batch flow coordinator: {count: 1, duration 1ms}]
                 0.000ms      0.000ms    [/cockroach.roachpb.Internal/Batch: {count: 2, duration 1ms}]
                 0.000ms      0.000ms    [txn coordinator send: {count: 1, duration 896µs}]
                 0.000ms      0.000ms    [dist sender send: {count: 1, duration 849µs}]
                 0.000ms      0.000ms    [optimizer: {count: 1, duration 462µs}]
                 0.000ms      0.000ms    [commit sql txn: {count: 1, duration 9µs}]
                 0.022ms      0.022ms        === operation:optimizer \_verbose:1 node:7 client:127.0.0.1:41838 user:root
                 0.064ms      0.042ms        event:sql/catalog/lease/descriptor\_version\_state.go:124 [n7,client=127.0.0.1:41838,user=root] descriptorVersionState.incRefCount: 100("defaultdb") ver=1:1682435625.902079054,0, refcount=1
                 0.084ms      0.020ms        event:sql/catalog/descs/descriptor.go:137 [n7,client=127.0.0.1:41838,user=root] looking up descriptors for ids [100]
                 0.106ms      0.022ms        event:sql/catalog/lease/descriptor\_version\_state.go:124 [n7,client=127.0.0.1:41838,user=root] descriptorVersionState.incRefCount: 104("db") ver=7:1682435628.343728034,0, refcount=1
                 0.118ms      0.012ms        event:sql/catalog/descs/descriptor.go:137 [n7,client=127.0.0.1:41838,user=root] looking up descriptors for ids [104]
                 0.130ms      0.012ms        event:sql/catalog/descs/descriptor.go:137 [n7,client=127.0.0.1:41838,user=root] looking up descriptors for ids [105]
                 0.146ms      0.016ms        event:sql/catalog/lease/descriptor\_version\_state.go:124 [n7,client=127.0.0.1:41838,user=root] descriptorVersionState.incRefCount: 105("public") ver=1:1682435606.362057433,0, refcount=1
                 0.166ms      0.020ms        event:sql/catalog/lease/descriptor\_version\_state.go:124 [n7,client=127.0.0.1:41838,user=root] descriptorVersionState.incRefCount: 108("rbt") ver=3:1682435628.385268669,0, refcount=1
                 0.177ms      0.012ms        event:sql/catalog/descs/descriptor.go:137 [n7,client=127.0.0.1:41838,user=root] looking up descriptors for ids [108]
                 0.204ms      0.027ms        event:sql/plan\_opt.go:390 [n7,client=127.0.0.1:41838,user=root] query cache hit but needed update
                 0.215ms      0.011ms        event:sql/opt/optbuilder/builder.go:204 [n7,client=127.0.0.1:41838,user=root] optbuilder start
                 0.241ms      0.026ms        event:sql/catalog/descs/descriptor.go:137 [n7,client=127.0.0.1:41838,user=root] looking up descriptors for ids [100]
                 0.254ms      0.013ms        event:sql/catalog/descs/descriptor.go:137 [n7,client=127.0.0.1:41838,user=root] looking up descriptors for ids [104]
                 0.265ms      0.011ms        event:sql/catalog/descs/descriptor.go:137 [n7,client=127.0.0.1:41838,user=root] looking up descriptors for ids [105]
                 0.276ms      0.012ms        event:sql/catalog/descs/descriptor.go:137 [n7,client=127.0.0.1:41838,user=root] looking up descriptors for ids [108]
                 0.383ms      0.106ms        event:sql/opt/optbuilder/builder.go:250 [n7,client=127.0.0.1:41838,user=root] optbuilder finish
                 0.395ms      0.012ms        event:sql/plan\_opt.go:390 [n7,client=127.0.0.1:41838,user=root] optimizing (no placeholders)
                 0.404ms      0.009ms        event:sql/opt/xform/optimizer.go:241 [n7,client=127.0.0.1:41838,user=root] optimize start
                 0.454ms      0.051ms        event:sql/opt/xform/optimizer.go:290 [n7,client=127.0.0.1:41838,user=root] optimize finish
                 0.511ms      0.511ms    event:sql/conn\_executor\_exec.go:964 [n7,client=127.0.0.1:41838,user=root] planning ends
                 0.528ms      0.018ms    event:sql/conn\_executor\_exec.go:964 [n7,client=127.0.0.1:41838,user=root] checking distributability
                 0.541ms      0.013ms    event:sql/conn\_executor\_exec.go:964 [n7,client=127.0.0.1:41838,user=root] will distribute plan: false
                 0.557ms      0.016ms    event:sql/conn\_executor\_exec.go:964 [n7,client=127.0.0.1:41838,user=root] executing after 0 retries, last retry reason: <nil>
                 0.572ms      0.015ms    event:sql/conn\_executor\_exec.go:964 [n7,client=127.0.0.1:41838,user=root] execution starts: distributed engine
                 0.588ms      0.016ms    event:sql/distsql\_running.go:1903 [n7,client=127.0.0.1:41838,user=root] creating DistSQL plan with isLocal=true
                 0.607ms      0.019ms    event:sql/distsql\_running.go:815 [n7,client=127.0.0.1:41838,user=root] running DistSQL plan
                 0.611ms      0.005ms        === operation:flow \_verbose:1 node:7 client:127.0.0.1:41838 user:root
                 0.611ms      0.000ms        [colbatchscan: {count: 1, duration 1ms, unfinished}]
                 0.611ms      0.000ms        [batch flow coordinator: {count: 1, duration 1ms}]
                 0.611ms      0.000ms        [/cockroach.roachpb.Internal/Batch: {count: 2, duration 1ms}]
                 0.611ms      0.000ms        [txn coordinator send: {count: 1, duration 896µs}]
                 0.611ms      0.000ms        [dist sender send: {count: 1, duration 849µs}]
                 0.633ms      0.021ms        event:sql/colflow/vectorized\_flow.go:232 [n7,client=127.0.0.1:41838,user=root] setting up vectorized flow
                 0.673ms      0.040ms        event:sql/colflow/vectorized\_flow.go:273 [n7,client=127.0.0.1:41838,user=root] vectorized flow setup succeeded
                 0.688ms      0.015ms        event:sql/flowinfra/flow.go:462 [n7,client=127.0.0.1:41838,user=root] starting (0 processors, 0 startables) asynchronously
                 0.699ms      0.012ms        event:sql/colflow/vectorized\_flow.go:316 [n7,client=127.0.0.1:41838,user=root] running the batch flow coordinator in the flow's goroutine
                 0.701ms      0.001ms            === operation:batch flow coordinator \_verbose:1 node:7 client:127.0.0.1:41838 user:root cockroach.flowid:4e3a0c41-fdb7-4b18-8adc-2b111a389269 cockroach.processorid:0
                 0.701ms      0.000ms            [colbatchscan: {count: 1, duration 1ms, unfinished}]
                 0.701ms      0.000ms            [/cockroach.roachpb.Internal/Batch: {count: 2, duration 1ms}]
                 0.701ms      0.000ms            [txn coordinator send: {count: 1, duration 896µs}]
                 0.701ms      0.000ms            [dist sender send: {count: 1, duration 849µs}]
                 0.705ms      0.004ms                === operation:colbatchscan \_unfinished:1 \_verbose:1 node:7 client:127.0.0.1:41838 user:root cockroach.flowid:4e3a0c41-fdb7-4b18-8adc-2b111a389269 cockroach.processorid:0
                 0.705ms      0.000ms                [/cockroach.roachpb.Internal/Batch: {count: 2, duration 1ms}]
                 0.705ms      0.000ms                [txn coordinator send: {count: 1, duration 896µs}]
                 0.705ms      0.000ms                [dist sender send: {count: 1, duration 849µs}]
                 0.750ms      0.045ms                event:sql/row/kv\_batch\_fetcher.go:475 [n7,client=127.0.0.1:41838,user=root] Scan /Table/108/1/1/0
                 0.753ms      0.003ms                    === operation:txn coordinator send \_verbose:1 node:7 client:127.0.0.1:41838 user:root txnID:1d88d1bd-03cc-4669-9bbf-b2add49c540a
                 0.753ms      0.000ms                    [/cockroach.roachpb.Internal/Batch: {count: 2, duration 1ms}]
                 0.753ms      0.000ms                    [dist sender send: {count: 1, duration 849µs}]
                 0.762ms      0.008ms                        === operation:dist sender send \_verbose:1 node:7 client:127.0.0.1:41838 user:root txn:1d88d1bd
                 0.762ms      0.000ms                        [/cockroach.roachpb.Internal/Batch: {count: 2, duration 1ms}]
                 0.787ms      0.026ms                        event:kv/kvclient/kvcoord/range\_iter.go:188 [n7,client=127.0.0.1:41838,user=root,txn=1d88d1bd] querying next range at /Table/108/1/1/0
                 0.840ms      0.053ms                        event:kv/kvclient/kvcoord/range\_iter.go:225 [n7,client=127.0.0.1:41838,user=root,txn=1d88d1bd] key: /Table/108/1/1/0, desc: r64:/{Table/108-Max} [(n5,s5):22, (n8,s8):10, (n2,s2):14, next=23, gen=81]
                 0.859ms      0.019ms                        event:kv/kvclient/kvcoord/dist\_sender.go:2078 [n7,client=127.0.0.1:41838,user=root,txn=1d88d1bd] routing to nearest replica; leaseholder not required
                 0.888ms      0.030ms                        event:kv/kvclient/kvcoord/dist\_sender.go:2136 [n7,client=127.0.0.1:41838,user=root,txn=1d88d1bd] r64: sending batch 1 Get to (n2,s2):14
                 0.900ms      0.012ms                        event:rpc/nodedialer/nodedialer.go:166 [n7,client=127.0.0.1:41838,user=root,txn=1d88d1bd] sending request to 127.0.0.1:37265
                 0.906ms      0.006ms                            === operation:/cockroach.roachpb.Internal/Batch \_verbose:1 node:7 client:127.0.0.1:41838 user:root txn:1d88d1bd span.kind:client
                 1.039ms      0.133ms                            === operation:/cockroach.roachpb.Internal/Batch \_verbose:1 node:2 span.kind:server request:Get [/Table/108/1/1/0,/Min), [txn: 1d88d1bd]
                 1.056ms      0.016ms                            event:server/node.go:1171 [n2] node received request: 1 Get
                 1.096ms      0.040ms                            event:kv/kvserver/store\_send.go:167 [n2,s2] executing Get [/Table/108/1/1/0,/Min), [txn: 1d88d1bd]
                 1.112ms      0.017ms                            event:kv/kvserver/replica\_send.go:179 [n2,s2,r64/14:/{Table/108-Max}] read-only path
                 1.128ms      0.015ms                            event:kv/kvserver/concurrency/concurrency\_manager.go:194 [n2,s2,r64/14:/{Table/108-Max}] sequencing request
                 1.136ms      0.009ms                            event:kv/kvserver/concurrency/concurrency\_manager.go:275 [n2,s2,r64/14:/{Table/108-Max}] acquiring latches
                 1.150ms      0.013ms                            event:kv/kvserver/concurrency/concurrency\_manager.go:319 [n2,s2,r64/14:/{Table/108-Max}] scanning lock table for conflicting locks
                 1.188ms      0.039ms                            event:kv/kvserver/replica\_read.go:296 [n2,s2,r64/14:/{Table/108-Max}] can drop latches early for batch (Get [/Table/108/1/1/0,/Min), [txn: 1d88d1bd]); scanning lock table first to detect conflicts
                 1.221ms      0.033ms                            event:kv/kvserver/replica\_read.go:113 [n2,s2,r64/14:/{Table/108-Max}] lock table scan complete without conflicts; dropping latches early
                 1.235ms      0.015ms                            event:kv/kvserver/replica\_read.go:408 [n2,s2,r64/14:/{Table/108-Max}] executing read-only batch
                 1.349ms      0.114ms                            event:kv/kvserver/replica\_evaluate.go:550 [n2,s2,r64/14:/{Table/108-Max}] evaluated Get command header:<key:"\\364\\211\\211\\210" > , txn="sql txn" meta={id=1d88d1bd key=/Min iso=Serializable pri=0.05782181 epo=0 ts=1682435328.155423000,0 min=1682435328.155423000,0 seq=0} lock=false stat=PENDING rts=1682435328.155423000,0 wto=false gul=1682435328.155423000,0 : resp=header:<num\_keys:1 num\_bytes:7 > value:<raw\_bytes:"\\375\\311[\\010\\n#\\002" timestamp:<wall\_time:1682435305313061732 > > , err=<nil>
                 1.392ms      0.043ms                            event:scan stats: stepped 0 times (0 internal); seeked 1 times (1 internal); block-bytes: (total 8.2 KiB, cached 8.2 KiB); points: (count 1 , key-bytes 14 B, value-bytes 14 B, tombstoned: 0 ) ranges: (count 0 ), (contained-points 0 , skipped-points 0 ) evaluated requests: 1 gets, 0 scans, 0 reverse scans
                 1.405ms      0.013ms                            event:kv/kvserver/replica\_read.go:221 [n2,s2,r64/14:/{Table/108-Max}] read completed
                 1.353ms     -0.052ms                            structured:{"@type":"type.googleapis.com/cockroach.roachpb.ScanStats","numInterfaceSeeks":"1","numInternalSeeks":"1","numInterfaceSteps":"0","numInternalSteps":"0","blockBytes":"8399","blockBytesInCache":"8399","keyBytes":"14","valueBytes":"14","pointCount":"1","pointsCoveredByRangeTombstones":"0","rangeKeyCount":"0","rangeKeyContainedPoints":"0","rangeKeySkippedPoints":"0","separatedPointCount":"0","separatedPointValueBytes":"0","separatedPointValueBytesFetched":"0","numGets":"1","numScans":"0","numReverseScans":"0"}
                 1.647ms      0.893ms                    event:kv/kvclient/kvcoord/txn\_interceptor\_span\_refresher.go:583 [n7,client=127.0.0.1:41838,user=root,txn=1d88d1bd] recording span to refresh: /Table/108/1/1/0
                 1.800ms      1.100ms            event:component:<flow\_id:<4e3a0c41-fdb7-4b18-8adc-2b111a389269> type:PROCESSOR id:0 sql\_instance\_id:7 > net\_rx:<latency:<> wait\_time:<> deserialization\_time:<> tuples\_received:<> bytes\_received:<> messages\_received:<> > net\_tx:<tuples\_sent:<> bytes\_sent:<> messages\_sent:<> > kv:<bytes\_read:<value\_plus\_one:12 > tuples\_read:<value\_plus\_one:2 > batch\_requests\_issued:<value\_plus\_one:2 > kv\_time:<value\_plus\_one:<nanos:973897 > > contention\_time:<value\_plus\_one:<nanos:1 > > num\_interface\_steps:<value\_plus\_one:1 > num\_internal\_steps:<value\_plus\_one:1 > num\_interface\_seeks:<value\_plus\_one:2 > num\_internal\_seeks:<value\_plus\_one:2 > block\_bytes:<value\_plus\_one:8400 > block\_bytes\_in\_cache:<value\_plus\_one:8400 > key\_bytes:<value\_plus\_one:15 > value\_bytes:<value\_plus\_one:15 > point\_count:<value\_plus\_one:2 > points\_covered\_by\_range\_tombstones:<value\_plus\_one:1 > range\_key\_count:<value\_plus\_one:1 > range\_key\_contained\_points:<value\_plus\_one:1 > range\_key\_skipped\_points:<value\_plus\_one:1 > kv\_cpu\_time:<> num\_gets:<value\_plus\_one:2 > num\_scans:<value\_plus\_one:1 > num\_reverse\_scans:<value\_plus\_one:1 > > exec:<exec\_time:<> max\_allocated\_mem:<value\_plus\_one:20481 > max\_allocated\_disk:<> consumed\_ru:<value\_plus\_one:1 > cpu\_time:<value\_plus\_one:<nanos:20582 > > > output:<num\_batches:<value\_plus\_one:2 > num\_tuples:<value\_plus\_one:2 > > flow\_stats:<max\_mem\_usage:<> max\_disk\_usage:<> consumed\_ru:<> >
                 1.877ms      0.077ms            event:component:<flow\_id:<00000000-0000-0000-0000-000000000000> type:UNSET id:0 sql\_instance\_id:0 > net\_rx:<latency:<> wait\_time:<> deserialization\_time:<> tuples\_received:<> bytes\_received:<> messages\_received:<> > net\_tx:<tuples\_sent:<> bytes\_sent:<> messages\_sent:<> > kv:<bytes\_read:<> tuples\_read:<> batch\_requests\_issued:<> kv\_time:<> contention\_time:<> num\_interface\_steps:<> num\_internal\_steps:<> num\_interface\_seeks:<> num\_internal\_seeks:<> block\_bytes:<> block\_bytes\_in\_cache:<> key\_bytes:<> value\_bytes:<> point\_count:<> points\_covered\_by\_range\_tombstones:<> range\_key\_count:<> range\_key\_contained\_points:<> range\_key\_skipped\_points:<> kv\_cpu\_time:<> num\_gets:<> num\_scans:<> num\_reverse\_scans:<> > exec:<exec\_time:<> max\_allocated\_mem:<> max\_allocated\_disk:<> consumed\_ru:<> cpu\_time:<> > output:<num\_batches:<> num\_tuples:<> > flow\_stats:<max\_mem\_usage:<> max\_disk\_usage:<> consumed\_ru:<> >
                 1.690ms     -0.187ms            structured:{"@type":"type.googleapis.com/cockroach.sql.distsqlrun.ComponentStats","component":{"flowId":"4e3a0c41-fdb7-4b18-8adc-2b111a389269","type":"PROCESSOR","id":0,"sqlInstanceId":7},"netRx":{"latency":{"valuePlusOne":"0s"},"waitTime":{"valuePlusOne":"0s"},"deserializationTime":{"valuePlusOne":"0s"},"tuplesReceived":{"valuePlusOne":"0"},"bytesReceived":{"valuePlusOne":"0"},"messagesReceived":{"valuePlusOne":"0"}},"netTx":{"tuplesSent":{"valuePlusOne":"0"},"bytesSent":{"valuePlusOne":"0"},"messagesSent":{"valuePlusOne":"0"}},"kv":{"bytesRead":{"valuePlusOne":"12"},"tuplesRead":{"valuePlusOne":"2"},"batchRequestsIssued":{"valuePlusOne":"2"},"kvTime":{"valuePlusOne":"0.000973897s"},"contentionTime":{"valuePlusOne":"0.000000001s"},"numInterfaceSteps":{"valuePlusOne":"1"},"numInternalSteps":{"valuePlusOne":"1"},"numInterfaceSeeks":{"valuePlusOne":"2"},"numInternalSeeks":{"valuePlusOne":"2"},"blockBytes":{"valuePlusOne":"8400"},"blockBytesInCache":{"valuePlusOne":"8400"},"keyBytes":{"valuePlusOne":"15"},"valueBytes":{"valuePlusOne":"15"},"pointCount":{"valuePlusOne":"2"},"pointsCoveredByRangeTombstones":{"valuePlusOne":"1"},"rangeKeyCount":{"valuePlusOne":"1"},"rangeKeyContainedPoints":{"valuePlusOne":"1"},"rangeKeySkippedPoints":{"valuePlusOne":"1"},"kvCpuTime":{"valuePlusOne":"0s"},"numGets":{"valuePlusOne":"2"},"numScans":{"valuePlusOne":"1"},"numReverseScans":{"valuePlusOne":"1"}},"exec":{"execTime":{"valuePlusOne":"0s"},"maxAllocatedMem":{"valuePlusOne":"20481"},"maxAllocatedDisk":{"valuePlusOne":"0"},"consumedRu":{"valuePlusOne":"1"},"cpuTime":{"valuePlusOne":"0.000020582s"}},"output":{"numBatches":{"valuePlusOne":"2"},"numTuples":{"valuePlusOne":"2"}},"inputs":[],"flowStats":{"maxMemUsage":{"valuePlusOne":"0"},"maxDiskUsage":{"valuePlusOne":"0"},"consumedRu":{"valuePlusOne":"0"}}}
                 1.804ms      0.114ms            structured:{"@type":"type.googleapis.com/cockroach.sql.distsqlrun.ComponentStats","component":{"flowId":"00000000-0000-0000-0000-000000000000","type":"UNSET","id":0,"sqlInstanceId":0},"netRx":{"latency":{"valuePlusOne":"0s"},"waitTime":{"valuePlusOne":"0s"},"deserializationTime":{"valuePlusOne":"0s"},"tuplesReceived":{"valuePlusOne":"0"},"bytesReceived":{"valuePlusOne":"0"},"messagesReceived":{"valuePlusOne":"0"}},"netTx":{"tuplesSent":{"valuePlusOne":"0"},"bytesSent":{"valuePlusOne":"0"},"messagesSent":{"valuePlusOne":"0"}},"kv":{"bytesRead":{"valuePlusOne":"0"},"tuplesRead":{"valuePlusOne":"0"},"batchRequestsIssued":{"valuePlusOne":"0"},"kvTime":{"valuePlusOne":"0s"},"contentionTime":{"valuePlusOne":"0s"},"numInterfaceSteps":{"valuePlusOne":"0"},"numInternalSteps":{"valuePlusOne":"0"},"numInterfaceSeeks":{"valuePlusOne":"0"},"numInternalSeeks":{"valuePlusOne":"0"},"blockBytes":{"valuePlusOne":"0"},"blockBytesInCache":{"valuePlusOne":"0"},"keyBytes":{"valuePlusOne":"0"},"valueBytes":{"valuePlusOne":"0"},"pointCount":{"valuePlusOne":"0"},"pointsCoveredByRangeTombstones":{"valuePlusOne":"0"},"rangeKeyCount":{"valuePlusOne":"0"},"rangeKeyContainedPoints":{"valuePlusOne":"0"},"rangeKeySkippedPoints":{"valuePlusOne":"0"},"kvCpuTime":{"valuePlusOne":"0s"},"numGets":{"valuePlusOne":"0"},"numScans":{"valuePlusOne":"0"},"numReverseScans":{"valuePlusOne":"0"}},"exec":{"execTime":{"valuePlusOne":"0s"},"maxAllocatedMem":{"valuePlusOne":"0"},"maxAllocatedDisk":{"valuePlusOne":"0"},"consumedRu":{"valuePlusOne":"0"},"cpuTime":{"valuePlusOne":"0s"}},"output":{"numBatches":{"valuePlusOne":"0"},"numTuples":{"valuePlusOne":"0"}},"inputs":[],"flowStats":{"maxMemUsage":{"valuePlusOne":"0"},"maxDiskUsage":{"valuePlusOne":"0"},"consumedRu":{"valuePlusOne":"0"}}}
                 2.005ms      1.306ms        event:component:<flow\_id:<4e3a0c41-fdb7-4b18-8adc-2b111a389269> type:FLOW id:0 sql\_instance\_id:7 > net\_rx:<latency:<> wait\_time:<> deserialization\_time:<> tuples\_received:<> bytes\_received:<> messages\_received:<> > net\_tx:<tuples\_sent:<> bytes\_sent:<> messages\_sent:<> > kv:<bytes\_read:<> tuples\_read:<> batch\_requests\_issued:<> kv\_time:<> contention\_time:<> num\_interface\_steps:<> num\_internal\_steps:<> num\_interface\_seeks:<> num\_internal\_seeks:<> block\_bytes:<> block\_bytes\_in\_cache:<> key\_bytes:<> value\_bytes:<> point\_count:<> points\_covered\_by\_range\_tombstones:<> range\_key\_count:<> range\_key\_contained\_points:<> range\_key\_skipped\_points:<> kv\_cpu\_time:<> num\_gets:<> num\_scans:<> num\_reverse\_scans:<> > exec:<exec\_time:<> max\_allocated\_mem:<> max\_allocated\_disk:<> consumed\_ru:<> cpu\_time:<> > output:<num\_batches:<> num\_tuples:<> > flow\_stats:<max\_mem\_usage:<value\_plus\_one:20481 > max\_disk\_usage:<value\_plus\_one:1 > consumed\_ru:<> >
                 1.909ms     -0.096ms        structured:{"@type":"type.googleapis.com/cockroach.sql.distsqlrun.ComponentStats","component":{"flowId":"4e3a0c41-fdb7-4b18-8adc-2b111a389269","type":"FLOW","id":0,"sqlInstanceId":7},"netRx":{"latency":{"valuePlusOne":"0s"},"waitTime":{"valuePlusOne":"0s"},"deserializationTime":{"valuePlusOne":"0s"},"tuplesReceived":{"valuePlusOne":"0"},"bytesReceived":{"valuePlusOne":"0"},"messagesReceived":{"valuePlusOne":"0"}},"netTx":{"tuplesSent":{"valuePlusOne":"0"},"bytesSent":{"valuePlusOne":"0"},"messagesSent":{"valuePlusOne":"0"}},"kv":{"bytesRead":{"valuePlusOne":"0"},"tuplesRead":{"valuePlusOne":"0"},"batchRequestsIssued":{"valuePlusOne":"0"},"kvTime":{"valuePlusOne":"0s"},"contentionTime":{"valuePlusOne":"0s"},"numInterfaceSteps":{"valuePlusOne":"0"},"numInternalSteps":{"valuePlusOne":"0"},"numInterfaceSeeks":{"valuePlusOne":"0"},"numInternalSeeks":{"valuePlusOne":"0"},"blockBytes":{"valuePlusOne":"0"},"blockBytesInCache":{"valuePlusOne":"0"},"keyBytes":{"valuePlusOne":"0"},"valueBytes":{"valuePlusOne":"0"},"pointCount":{"valuePlusOne":"0"},"pointsCoveredByRangeTombstones":{"valuePlusOne":"0"},"rangeKeyCount":{"valuePlusOne":"0"},"rangeKeyContainedPoints":{"valuePlusOne":"0"},"rangeKeySkippedPoints":{"valuePlusOne":"0"},"kvCpuTime":{"valuePlusOne":"0s"},"numGets":{"valuePlusOne":"0"},"numScans":{"valuePlusOne":"0"},"numReverseScans":{"valuePlusOne":"0"}},"exec":{"execTime":{"valuePlusOne":"0s"},"maxAllocatedMem":{"valuePlusOne":"0"},"maxAllocatedDisk":{"valuePlusOne":"0"},"consumedRu":{"valuePlusOne":"0"},"cpuTime":{"valuePlusOne":"0s"}},"output":{"numBatches":{"valuePlusOne":"0"},"numTuples":{"valuePlusOne":"0"}},"inputs":[],"flowStats":{"maxMemUsage":{"valuePlusOne":"20481"},"maxDiskUsage":{"valuePlusOne":"1"},"consumedRu":{"valuePlusOne":"0"}}}
                 2.043ms      1.436ms    event:sql/conn\_executor\_exec.go:964 [n7,client=127.0.0.1:41838,user=root] execution ends
                 2.057ms      0.014ms    event:sql/conn\_executor\_exec.go:964 [n7,client=127.0.0.1:41838,user=root] rows affected: 1
                 2.174ms      0.117ms        === operation:commit sql txn \_verbose:1 node:7 client:127.0.0.1:41838 user:root
                 2.206ms      0.149ms    event:sql/conn\_executor\_exec.go:2720 [n7,client=127.0.0.1:41838,user=root] AutoCommit. err: <nil>
    datadriven\_test.go:428: -- test log scope end --
FAIL
test logs left over in: /tmp/cockroach/\_tmp/e558fc8050776f4c54ea39ba371b49da/logTestMultiRegionDataDriven2849108481
I230425 15:08:49.739293 1 (gostd) testmain.go:136  [-] 1  Test //pkg/ccl/multiregionccl:multiregionccl\_test exited with error code 1


ERROR: exit status 1

15 runs completed, 1 failures, over 1m6s
context canceled

@matthewtodd matthewtodd marked this pull request as ready for review April 25, 2023 20:15
@matthewtodd matthewtodd requested a review from a team April 25, 2023 20:15
@matthewtodd matthewtodd requested a review from a team as a code owner April 25, 2023 20:15
@matthewtodd matthewtodd requested review from michae2 and a team April 25, 2023 20:15
@matthewtodd matthewtodd added backport-23.1.x Flags PRs that need to be backported to 23.1 backport-23.1.0 labels Apr 25, 2023
Part of #89949.

Addresses #98020.
Addresses #99563.

Related to cockroachdb/roachperf#129.
Related to #102170.

Previously, we attempted to record all the regions hit in a single
statement execution in the sqlstats tables, leaning on the
sqlAddressResolver to map traced nodeIDs to localities at execution
time.

While the sqlAddressResolver is generally non-blocking, the introduction
of this code did cause some of the multiregion "this query shouldn't
span regions" tests to start [flaking][] and it's more recently been
[implicated][] in a 2.5% performance regression.

Given that the probabilistic nature of the tracing meant that we
generally weren't capturing all the relevant nodeIDs anyway, it seems
like the most prudent thing to do here is take a step back and regroup.

In the short term, let's stop even trying to gather all these regions.
In the medium/long term, let's see if we can find a better approach.

[flaking]: #98020
[implicated]: https://github.com/cockroachdb/roachperf/pull/129

Release note: None
Copy link
Contributor

@j82w j82w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @michae2)

@matthewtodd
Copy link
Contributor Author

bors r=j82w

@craig
Copy link
Contributor

craig bot commented Apr 26, 2023

Build failed:

@matthewtodd
Copy link
Contributor Author

bors r=j82w

@craig
Copy link
Contributor

craig bot commented Apr 26, 2023

Build succeeded:

@craig craig bot merged commit e22055e into cockroachdb:master Apr 26, 2023
@blathers-crl
Copy link

blathers-crl bot commented Apr 26, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 0e19563 to blathers/backport-release-23.1-102192: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.1.x failed. See errors above.


error creating merge commit from 0e19563 to blathers/backport-release-23.1.0-102192: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.1.0 failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@@ -189,7 +186,11 @@ func (ex *connExecutor) recordStatementSummary(
}

nodes := util.CombineUnique(getNodesFromPlanner(planner), []int64{nodeID})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally: ExtractNodesFromSpan returns an unordered set already (intsets.Fast). Then ForEach converts it into a sequence. Then the caller uses util.CombineUnique to transform the sequence back into a set. This is just silly and these steps should be combined.

(cc @j82w )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants