-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql/ui: surface network bytes sent in statements page #55969
Conversation
This is what the statement page looks like when statement diagnostics is enabled: @dhartunian I'd appreciate if you could take a look at the Another question I'd like to figure out is whether we want to think more about returning these stats in @RaduBerinde are you willing to be the primary reviewer on this? I think we'll be working pretty closely on observability stuff so it makes the most sense to me. |
cc @yuzefovich / @cathymw on the first commit and fixing outbox bytes sent non-determinism. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. I have a few nits.
Also, I'm curious why we're showing a decimal point in the number of "rows read" in the image, that looks not pretty, cc @dhartunian
Reviewed 13 of 13 files at r1, 5 of 5 files at r2, 6 of 6 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto, @cathymw, @dhartunian, and @RaduBerinde)
pkg/sql/distsql_physical_planner.go, line 608 at r1 (raw file):
noEvalSubqueries bool // If set, the flows for this physical plan will be passed to this function.
super nit: s/this physical/the physical/
since it's not clear what "this" refers to.
pkg/sql/plan.go, line 338 at r2 (raw file):
// If we are collecting query diagnostics, flow diagrams are saved here. distSQLDiagrams []annotatedFlowDiagram // If we are collection query diagnostics, physical plans are saved here.
nit: s/collection/collecting/
.
pkg/sql/traceanalyzer.go, line 22 at r1 (raw file):
"github.com/cockroachdb/cockroach/pkg/util/tracing" "github.com/gogo/protobuf/types" "github.com/pkg/errors"
nit: we should use our cockroachdb/errors
:)
pkg/sql/traceanalyzer.go, line 27 at r1 (raw file):
const nodeTagKey = "node" type TraceAnalyzer struct {
Quick comment about the purpose of the struct would be helpful.
pkg/sql/traceanalyzer.go, line 43 at r1 (raw file):
func (a *TraceAnalyzer) populateIDMaps(flows map[roachpb.NodeID]*execinfrapb.FlowSpec) { for k := range a.processorIDMap {
nit: these two for
loops could be unified into one similar to what you have in extractStatsByID
.
Also, would do you think about actually merging populateIDMaps
and extractStatsByID
methods into one and define a separate helper method that deletes all entries from all 4 maps and calling this new helper method in the beginning of the merged method?
pkg/sql/traceanalyzer_test.go, line 54 at r1 (raw file):
return nil } fmt.Println(stmt)
Is this a leftover?
pkg/sql/traceanalyzer_test.go, line 143 at r1 (raw file):
{ analyzer: colexecTraceAnalyzer, expectedBytes: 1872,
Just curious: do you know why there is such a big difference in two numbers? Maybe add a quick comment if so.
pkg/sql/logictest/testdata/logic_test/dist_vectorize, line 59 at r3 (raw file):
SELECT url FROM [EXPLAIN ANALYZE SELECT * FROM kv JOIN kw ON kv.k = kw.k] ---- https://cockroachdb.github.io/distsqlplan/decode.html#eJzMWNFu2zYUfd9XEPepXelKpKQ0FlDA2ZYBLlK7i_OwrfCDYnG2YEfSKCpuEOTfB1kLEjs2RZmipbdYJnXOuT73hLyPkP27Ah8ml1eXv96gnK_Q79fjr-j75Z_fri6GI3Qxurj66-9L9O634eRm8sfVe_T_0p_Lhct79GU8HKHlGo1HaHn_cYk-o-X643IKGOIkZKPgjmXgfwcCGChgcACDCxg8mGJIeTJjWZbwYsnjZsMw_AG-jSGK01wUj6cYZgln4D-CiMSKgQ83we2KXbMgZNyyAUPIRBCtNjDL-0HKo7uAPwCGSRrEmY96VgF8G4jZgmUoyUWaCx8VG0Werl49KjgOx0hEd8xHdlZsehAsQ5wFoY9oH_0CGHiyfn5CYPqEodxdKnhmePuAFkG22OY2KNZPMWQimDPwyRM-TvLZjuT1W8kW6ahoelD0y3sSHjLOwt33fCiAlVbtqd9XxufsSxLFjFtkxzMr9o94NyAf3n_m0XxR_gkYxkV1BgQPKB44eOCql5P9YLNcREn8UtXtmr3Uw1GoRx7v07pX5ijpJalFvG2BCrS9erTdLdpE3bukul0t2rOcLnqXHCv6k0LDdlU0PSj6hA1LutqwVN0EVMH5Ts-qwbot59cQfa7g_K6KpgdFn9D5tKvOd9RN4Cg43-1ZXhdNQI4V3VdwfldF04OiT-h8p6vOd9VN4Co43-t10QLkWMlete97FkFBHCKCErFgvIvy6UH5J-wAt6sdUHFXvWZZmsQZ27mu7H-zXZSBhXNWFjdLcj5j33gy28CUH8ebfZvTZ8gyUX5Lyw_D-PmrTARC9bpjH5Rfh8KZLoXXhq6J3dfBrmgm-3xvN9WhR7R_Ho3aEM8oOKnhS2rGl3IKZ7oUNOT3dbCb8KWcHtH-eTRqs-PLpsHpLrj9GtzZ7ondzY50M9m2lG3G1a7ZwJcWT46tnfT68rUCX1O-4TSvANdK8yYixTMb-NLayLG1k17Fl3IKWoGvKd9wmleAa6V5E748qxP4TdJzqK1A71OLcS7HNntwl2Nr57hKy8op6MV5E86t4Gf2_H7eYpzLsc0e3OXY2jmu4ks5Bb04b8KXFfzMnt_73Y5zYngeI79YGZzENHLrNDit0S2O_uldaSTR7kzG8FBGB1wr1Rsxp8GRjW5x9I_wSuZsdTBDak1mWkj2N8Of5pK9Ef_K-Zk9zVeAG052ObjhuUwVunakK7VunalgG-FaZ3LYeLLUmRmeGNz0qL3W1NDU_5U3I6KWk3369NN_AQAA__-DJN1Z
nit: I see that we're not showing the number of bytes sent if that number is 0 although we do show some other fields, what do you think about showing zero bytes too? I'm not sure which way I'd prefer, but I wanted to point it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @cathymw, @dhartunian, @RaduBerinde, and @yuzefovich)
pkg/sql/distsql_physical_planner.go, line 608 at r1 (raw file):
Previously, yuzefovich wrote…
super nit:
s/this physical/the physical/
since it's not clear what "this" refers to.
Done.
pkg/sql/traceanalyzer.go, line 22 at r1 (raw file):
Previously, yuzefovich wrote…
nit: we should use our
cockroachdb/errors
:)
Done. Thanks. Didn't understand why go.mod
was changing.
pkg/sql/traceanalyzer.go, line 27 at r1 (raw file):
Previously, yuzefovich wrote…
Quick comment about the purpose of the struct would be helpful.
Done.
pkg/sql/traceanalyzer.go, line 43 at r1 (raw file):
these two for loops could be unified into one similar to what you have in extractStatsByID.
I can't because the map elements are of different types (roachpb.NodeID
vs [2]roachpb.NodeID
)
Also, would do you think about actually merging populateIDMaps and extractStatsByID methods into one and define a separate helper method that deletes all entries from all 4 maps and calling this new helper method in the beginning of the merged method?
populateIDMaps
and extractStatsByID
feel like logically distinct operations so it seems nicer to separate them (I could have kept all that code in this Reset
method). Is there a reason you'd like to see all of this in a single code block minus the map-resetting?
pkg/sql/traceanalyzer_test.go, line 54 at r1 (raw file):
Previously, yuzefovich wrote…
Is this a leftover?
Done.
pkg/sql/traceanalyzer_test.go, line 143 at r1 (raw file):
Previously, yuzefovich wrote…
Just curious: do you know why there is such a big difference in two numbers? Maybe add a quick comment if so.
I think it's just the overhead from arrow serialization as well as the fact that we try to encode ints in the least number of bytes possible in the rowexec case. Added a comment.
pkg/sql/logictest/testdata/logic_test/dist_vectorize, line 59 at r3 (raw file):
Previously, yuzefovich wrote…
nit: I see that we're not showing the number of bytes sent if that number is 0 although we do show some other fields, what do you think about showing zero bytes too? I'm not sure which way I'd prefer, but I wanted to point it out.
I'm not sure. @cathymw what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Until we have always-on tracing, we could collect this any time we collect the plan (see planTop.savePlanForStats
)
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto, @cathymw, @dhartunian, @RaduBerinde, and @yuzefovich)
pkg/sql/conn_executor_exec.go, line 971 at r4 (raw file):
planCtx.saveFlows = fn } } else {
[nit] can do else if planner.collectBundle
here and remove the outer if.
I assume we don't care about support bundles when TestingSaveFlows is set.
pkg/sql/conn_executor_exec.go, line 399 at r6 (raw file):
if err != nil && log.V(1) { log.Infof(ctx, "error calculating network bytes sent for stmt %s: %v", stmt, err) continue
We should continue
regardless of log.V
(maybe we can do log.VInfof() instead of checking, that has the benefit that it will always show up in traces)
pkg/sql/exec_util.go, line 853 at r4 (raw file):
// TestingSaveFlows, if set, will be called with the given stmt. The resulting // function will be called with the physical plan of that statement's main // query (i.e. no subqueries).
Mention that nil
can be returned (though maybe returning a no-op func would be cleaner).
pkg/sql/plan.go, line 270 at r5 (raw file):
var _ planNodeSpooled = &spoolNode{} type queryType int
The name is very general, how about "planComponentType"? (these things are grouped in the planComponents
struct). We could also move the defs next to that struct.
pkg/sql/plan.go, line 288 at r5 (raw file):
return "postquery" default: return "<unknown>"
[nit] maybe remove the brackets, since this makes it into a filename
pkg/sql/traceanalyzer.go, line 33 at r4 (raw file):
// analyzer.Reset(flows, trace) // bytesGroupedByNode, err := analyzer.GetNetworkBytesSent() type TraceAnalyzer struct {
This feels generic enough that it doesn't need to live in sql
. Is there an existing package we can use? Or we could make a new sql/execstats
package.
pkg/sql/traceanalyzer.go, line 35 at r4 (raw file):
type TraceAnalyzer struct { // processorIDMap maps a processor ID to a node it was planned on. processorIDMap map[int]roachpb.NodeID
We should make separate types for the processor ID and stream ID (or use the ones from the specs), to make things more evident
pkg/sql/traceanalyzer.go, line 38 at r4 (raw file):
// streamIDMap maps a stream ID to a given pair of nodes. The first node in // the array is the origin node, the second is the destination node. streamIDMap map[int][2]roachpb.NodeID
I know it's a bit of a pain, but making a struct with field names makes the code easier to read
pkg/sql/traceanalyzer.go, line 42 at r4 (raw file):
// processorStats maps a processor ID to the stats associated with this // processor extracted from a trace. processorStats map[int]execinfrapb.DistSQLSpanStats
It feels a little strange that we have two maps with the same key, can we just put the NodeID and the stats in a struct and use a single map?
pkg/sql/traceanalyzer.go, line 45 at r4 (raw file):
// streamStats maps a stream ID to the stats associated with this stream // extracted from a trace. streamStats map[int]execinfrapb.DistSQLSpanStats
ditto
pkg/sql/traceanalyzer_test.go, line 34 at r4 (raw file):
) func TestTraceAnalyzer(t *testing.T) {
[nit] It would be nice to give a high level overview of what the test does
pkg/sql/logictest/testdata/logic_test/dist_vectorize, line 54 at r4 (raw file):
SELECT url FROM [EXPLAIN ANALYZE SELECT count(*) FROM kv] ---- https://cockroachdb.github.io/distsqlplan/decode.html#eJzMld1um0AQhe_7FKu5squ1MD92bK7ipq6E5EBqHPUnQhGBkYuCWbq7pIksv3sFpAq2HKCqGnPJsIc5h29gtiB-xmCCO1_ML1Yk4zH5tHQuyc3869ViZtlkZs8W377PSe-j5a7cz4s-eT4asCyRvff98vz9gwcUEhai7W9QgHkDKlDQgIIOFAygMAKPQspZgEIwnh_ZFgIrfARzSCFK0kzmZY9CwDiCuQUZyRjBhJV_F-MS_RC5MgQKIUo_ios29w_nKY82Pn8CCm7qJ8IkAyVv7GTSJDZLECjc-TL4gYKwTKZ5OX-IzNK4Usr9Wg6R0QZNMhS56EmiIBz90CTalHwACpz9-lNRwdtRKNXPtoX01wimuqPto83Wa45rXzKujPaTXTjX9up26Xxxe_32EfARg0xGLHlJ8ppP7VWfL_ayhPEQOYZ73rxdfRL1AJJ7fXlr2aveufp_kuh7SdT2w6Q2D5OiDRS9O-PUEK4CYXzScdLaQ9BaQNAHitEdCA3hKhDOTgpBbw9BbwHBGBT_qI5AaAhXgTA5KQSjPQSjBYTRoDsIGqJVEEw7s9uO-FyiSFki8GDHHX_yMN99GK6xXJSCZTzAK86Cok156RS6ohCikOVdtbywkvJWbrAqVmvF2p5YPRRr9Z0bWuu1aqNebPyN7-ItFi_038dWHWlH53bf3qjW3rg-27jb2c5q7U3qs026nW1aP9HDho-p_lN8-3Te7t3vAAAA__8nPFQM
What caused this change? Why weren't we showing the bytes read before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto, @cathymw, @dhartunian, @RaduBerinde, and @yuzefovich)
pkg/sql/logictest/testdata/logic_test/dist_vectorize, line 59 at r3 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
I'm not sure. @cathymw what do you think?
We also don't show rows read if it's 0. I don't have a strong preference either. It might be nice to be consistent. I'm not sure if it would be confusing for people to not see all the stats, especially if other stats still get displayed when they have zero values. I don't know if it would look too crowded if the diagrams get very large and complicated though.
pkg/sql/logictest/testdata/logic_test/dist_vectorize, line 54 at r4 (raw file):
Previously, RaduBerinde wrote…
What caused this change? Why weren't we showing the bytes read before?
I think it's because when bytesRead == 0, they aren't displayed. And then, prior to these changes, for tests we would set bytes read to zero due to non-determinism. But now, Alfonso fixed the non-determinism, so the value no longer gets zeroed out and shows up on the diagram now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR. Addressed all comments but I'm still working on a test flake so this is not ready to merge yet.
Until we have always-on tracing, we could collect this any time we collect the plan (see planTop.savePlanForStats)
Do we enable tracing in those cases? It doesn't seem like it but I may be missing something.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @cathymw, @dhartunian, @jordanlewis, @RaduBerinde, and @yuzefovich)
pkg/sql/conn_executor_exec.go, line 399 at r6 (raw file):
Previously, RaduBerinde wrote…
We should
continue
regardless oflog.V
(maybe we can do log.VInfof() instead of checking, that has the benefit that it will always show up in traces)
Oops, good catch.
pkg/sql/plan.go, line 270 at r5 (raw file):
Previously, RaduBerinde wrote…
The name is very general, how about "planComponentType"? (these things are grouped in the
planComponents
struct). We could also move the defs next to that struct.
Done.
pkg/sql/traceanalyzer.go, line 43 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
these two for loops could be unified into one similar to what you have in extractStatsByID.
I can't because the map elements are of different types (
roachpb.NodeID
vs[2]roachpb.NodeID
)Also, would do you think about actually merging populateIDMaps and extractStatsByID methods into one and define a separate helper method that deletes all entries from all 4 maps and calling this new helper method in the beginning of the merged method?
populateIDMaps
andextractStatsByID
feel like logically distinct operations so it seems nicer to separate them (I could have kept all that code in thisReset
method). Is there a reason you'd like to see all of this in a single code block minus the map-resetting?
I ended up putting everything together in Reset
.
pkg/sql/traceanalyzer.go, line 33 at r4 (raw file):
Previously, RaduBerinde wrote…
This feels generic enough that it doesn't need to live in
sql
. Is there an existing package we can use? Or we could make a newsql/execstats
package.
That was my initial approach, but an import cycle occurs in the test because serverutils
depends on sql
which in turn depends on execstats
. It's unfortunate that a test causes this. Do you see any way to break the cycle? Another option is to keep the test in sql
but that seems unfortunate.
pkg/sql/traceanalyzer.go, line 42 at r4 (raw file):
Previously, RaduBerinde wrote…
It feels a little strange that we have two maps with the same key, can we just put the NodeID and the stats in a struct and use a single map?
Done. Good call.
pkg/sql/logictest/testdata/logic_test/dist_vectorize, line 59 at r3 (raw file):
Previously, cathymw wrote…
We also don't show rows read if it's 0. I don't have a strong preference either. It might be nice to be consistent. I'm not sure if it would be confusing for people to not see all the stats, especially if other stats still get displayed when they have zero values. I don't know if it would look too crowded if the diagrams get very large and complicated though.
@jordanlewis was surprised by this behavior as well. I'm happy to change it but will leave it for another PR.
pkg/sql/logictest/testdata/logic_test/dist_vectorize, line 54 at r4 (raw file):
Previously, cathymw wrote…
I think it's because when bytesRead == 0, they aren't displayed. And then, prior to these changes, for tests we would set bytes read to zero due to non-determinism. But now, Alfonso fixed the non-determinism, so the value no longer gets zeroed out and shows up on the diagram now
That's right. The non-determinism was due to including metadata size in bytes sent. The commit changes the bytes sent calculation to only include the row bytes in tests (i.e. when the DeterministicStats
flag is set) for determinism.
I think I figured the test issue out. It seems |
Oh yeah, we do pool those objects. I guess we'll have to release the flow specs back to the pool later, after the trace has been analyzed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I'm curious why we're showing a decimal point in the number of "rows read" in the image, that looks not pretty, cc @dhartunian
@yuzefovich unfortunately the stats table component applies a uniform number formatting to all rows. Once the statements page is embedded into CC, we'll devote more time and attention to these visual nits.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto, @cathymw, @dhartunian, @jordanlewis, @RaduBerinde, and @yuzefovich)
pkg/ui/src/views/statements/statementDetails.tsx, line 508 at r9 (raw file):
}, ].filter(function (r) { console.log("filtering", r.value);
nit: assume this will be removed before merge.
pkg/ui/src/views/statements/statementDetails.tsx, line 511 at r9 (raw file):
if (r.name === "Network Bytes Sent") { if (r.value.mean === 0) { // Omit if empty.
Curious if there's any difference in how the rows_read
and bytes_read
fields are generated and this one, why should we omit showing this when the number is zero?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we enable tracing in those cases? It doesn't seem like it but I may be missing something.
No, we would need to do that.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto, @jordanlewis, and @yuzefovich)
pkg/sql/traceanalyzer.go, line 33 at r4 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
That was my initial approach, but an import cycle occurs in the test because
serverutils
depends onsql
which in turn depends onexecstats
. It's unfortunate that a test causes this. Do you see any way to break the cycle? Another option is to keep the test insql
but that seems unfortunate.
serverutils
does not depend on sql
, it exists exactly to allow tests to create test servers and clusters without directly depending on server
and sql
. Plus since you are only testing external interfaces, the test would be in a something_test
package where it can depend on anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto, @jordanlewis, and @yuzefovich)
pkg/sql/traceanalyzer_test.go, line 14 at r7 (raw file):
import ( context "context"
[nit] redundant alias
pkg/sql/traceanalyzer_test.go, line 152 at r7 (raw file):
{ analyzer: colexecTraceAnalyzer, // Note that the expectedBytes in the colexec case is larger than the
It's strange that we'll need to update these tests whenever some over the wire format changes. Maybe we should just check that the value is in a reasonable range (e.g. 32 to 4096)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we would need to do that.
I like the idea but isn't the overhead concerning? Will leave this for later in any case
I changed thesaveFlow
semantics to disallow saving flows past the function's lifetime since we would otherwise need to defer releases up in conn executor space, which seems like a big no-no. The result is that we create analyzers in saveFlow
, have them transform the flow into the maps they can use, and attach those analyzers to a plan that are then reused by the conn executor.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @dhartunian, @jordanlewis, @RaduBerinde, and @yuzefovich)
pkg/sql/traceanalyzer.go, line 33 at r4 (raw file):
Previously, RaduBerinde wrote…
serverutils
does not depend onsql
, it exists exactly to allow tests to create test servers and clusters without directly depending onserver
andsql
. Plus since you are only testing external interfaces, the test would be in asomething_test
package where it can depend on anything.
Not directly, but to use the testcluster you need to initialize the factory in TestMain
:
func TestMain(m *testing.M) {
serverutils.InitTestClusterFactory(testcluster.TestClusterFactory)
}
This in turn depends on server
and sql
:
package github.com/cockroachdb/cockroach/pkg/sql/execstats (test)
imports github.com/cockroachdb/cockroach/pkg/testutils/testcluster
imports github.com/cockroachdb/cockroach/pkg/server
imports github.com/cockroachdb/cockroach/pkg/sql
imports github.com/cockroachdb/cockroach/pkg/sql/execstats: import cycle not allowed in test
Am I missing something? This is with the test already in the *_test
package.
pkg/ui/src/views/statements/statementDetails.tsx, line 511 at r9 (raw file):
Previously, dhartunian (David Hartunian) wrote…
Curious if there's any difference in how the
rows_read
andbytes_read
fields are generated and this one, why should we omit showing this when the number is zero?
There is a difference. The network bytes sent is only collected when a statement diagnostics bundle is requested because of the current overhead of collecting that information. The idea is that by the end of this release, we will get to a point where we can display this stat unconditionally. Until then, it would be misleading to display "0" for this stat if it hasn't been collected.
I'm still on the fence re adding the total network bytes sent for the query. Is that data useful? Would it be better to send back some 2D slice where each cell is the bytes sent from node i -> node j? We could possibly enhance such a data structure in the future with network latency and number of messages sent. We could potentially just deprecate the network bytes sent but it's probably worth thinking through what the best way forward here is. One problem with the per-node approach is that physical plans can change and nodes might not be part of a physical plan any longer. Is simply overwriting the previous data acceptable? I'm not sure anything else makes sense. In the context of this PR. Maybe I can merge this without the last commit and figure the admin-UI related plans later. |
pkg/sql/traceanalyzer.go, line 33 at r4 (raw file): Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
The trick is that TestMain should be in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from admin ui perspective, thanks for answering my question
I'm still on the fence re adding the total network bytes sent for the query. Is that data useful? Would it be better to send back some 2D slice where each cell is the bytes sent from node i -> node j? We could possibly enhance such a data structure in the future with network latency and number of messages sent. We could potentially just deprecate the network bytes sent but it's probably worth thinking through what the best way forward here is.
My take is to keep the data simpler in the short run. We already have more data in the Admin UI than we can properly visualize and make actionable so I hesitate to add extra detail unless we have a clear way to signal when it matters to someone.
Reviewed 3 of 13 files at r1, 1 of 18 files at r7, 19 of 19 files at r10, 6 of 6 files at r11.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto, @dhartunian, @jordanlewis, and @yuzefovich)
Thanks @RaduBerinde, that worked. Put the |
I think I've convinced myself that it's useful to have this statement-level statistic in any case. It keeps things simple as @dhartunian mentions, and is not incompatible with statistics in the context of a physical plan which we can return in addition to statement-level stats. It probably makes sense attach these to an EXPLAIN ANALYZE representation as extra metadata. Will bors on green. |
This commit adds TraceAnalyzer, which calculates general stats from a physical plan and an accompanying statement trace. This commit adds an example calculation of the number of netowrk bytes sent by each node in the plan. Additionally, this commit fixes non-determinism in outbox network statistics caused by variable metadata in order to make testing more useful. Release note: None
…l plans This commit factors out some repeated code and adds a query type to flow diagrams, which was a TODO when collecting statement bundles in order to be able to differentiate between main queries, postqueries, and subqueries. Release note: None
This commit uses a TraceAnalyzer when statement diagnostics are enabled to populate BytesSentOverNetwork on a per-statement basis. This data is displayed in the corresponding statement's page if collected. This work is a precursor to always-on tracing. Once always-on tracing is enabled, these types of statement statistics calculated from the trace will always be displayed in the admin UI. Release note (admin ui change): If statement diagnostics are enabled for a statement, the bytes sent over the network will be shown in the statement's page.
Latest CI failure seems like an infra issue. TFTRs bors r=RaduBerinde,dhartunian |
Build succeeded: |
Closes #55331
This PR implements the conditional statistics collection and display described in phase 1 of the always-on EXPLAIN ANALYZE issue (#54556). The goal of this work is to provide a framework to surface more top-level statistics while always-on tracing is being worked on. The next steps here are to surface more top-level statistics we find interesting to the admin UI (e.g. query memory usage by @cathymw in #54340).
Once always-on tracing is completed, the code added to the conn executor to analyze trace data will be moved to
recordStatement
.cc @jordanlewis