-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sqlstats: "get regions for node" logic is extremely inefficient #102170
Labels
A-cluster-observability
Related to cluster observability
branch-master
Failures and bugs on the master branch.
branch-release-23.1
Used to mark GA and release blockers, technical advisories, and bugs for 23.1
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Comments
knz
added
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
branch-master
Failures and bugs on the master branch.
GA-blocker
branch-release-23.1
Used to mark GA and release blockers, technical advisories, and bugs for 23.1
A-cluster-observability
Related to cluster observability
branch-release-23.1.0
labels
Apr 24, 2023
craig bot
pushed a commit
that referenced
this issue
Apr 26, 2023
102192: sqlstats: only include local region in statement_statistics r=j82w a=matthewtodd Part of #89949. Addresses #98020. Addresses #99563. Related to cockroachdb/roachperf#129. Related to #102170. Previously, we attempted to record all the regions hit in a single statement execution in the sqlstats tables, leaning on the sqlAddressResolver to map traced nodeIDs to localities at execution time. While the sqlAddressResolver is generally non-blocking, the introduction of this code did cause some of the multiregion "this query shouldn't span regions" tests to start [flaking][] and it's more recently been [implicated][] in a 2.5% performance regression. Given that the probabilistic nature of the tracing meant that we generally weren't capturing all the relevant nodeIDs anyway, it seems like the most prudent thing to do here is take a step back and regroup. In the short term, let's stop even trying to gather all these regions. In the medium/long term, let's see if we can find a better approach. [flaking]: #98020 [implicated]: https://github.com/cockroachdb/roachperf/pull/129 Release note: None Co-authored-by: Matthew Todd <[email protected]>
This was referenced Apr 26, 2023
craig bot
pushed a commit
that referenced
this issue
Jul 12, 2023
106587: sql: fix StatementStatistics.Nodes list r=j82w a=j82w 1. Fixes the nodes list to show all the nodes in StatementStatistics.Nodes. 2. The node list is now empty if tracing is disabled. Previously it would always include the current gateway node id, but it would be missing all the other nodes. This causes confusion because it's uncertain whether the node list is complete or not. 3. Fixes regions on `EXPLAIN ANALYSE (DISTSQL)` to show regions information on secondary tenenants. It was not shown before because only system tenants have acces to gossip which is used under the covers to get the node descriptors. 4. Fixes the performance issues previously listed in #102170. 5. Fixes the test to actually validate the nodes list. The fix was done by adding the region name to the Flow ComponentID. This means the region name is now part of the traces for the Flow ComponentID, so it no longer needs figure out the region. It gets the region information from the same trace the SQL Instance ID is obtained. Moving the collection to the QueryLevelStats avoids iterating the traces multiple times. Fixes: #102170, fixes: #96647, fixes: #91219; Epic: none Release note (sql change): Fixes the StatementStatistics.Nodes to contain all the nodes involved in the query. Adds region info to `EXPLAIN ANALYSE (DISTSQL)` for seconary tenants. Co-authored-by: j82w <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-cluster-observability
Related to cluster observability
branch-master
Failures and bugs on the master branch.
branch-release-23.1
Used to mark GA and release blockers, technical advisories, and bugs for 23.1
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Split from #101299
Describe the problem
This PR #95449 has implemented "storing the regions in which a statement was executed" using some logic that is abusively expensive:
the computation in
getNodesFromPlanner
is way too expensiveExtractNodesFromSpan
which callsprotoutil.Unmarshal
, which incurs a heap allocation and a relatively expensive decodeExtractNodesFromSpan
returns an unordered set already (intsets.Fast
). ThenForEach
converts it into a sequence. Then the caller usesutil.CombineUnique
to transform the sequence back into a set. This is just silly and these steps should be combined.the computation in
getRegionsForNode
is way too expensiveresolver.GetAllInstances
) even though the specific query under consideration might use just 1 of themExpected behavior
The code should be refactored overall to remove the excess CPU and memory overheads.
Jira issue: CRDB-27322
The text was updated successfully, but these errors were encountered: