Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/ui: surface network bytes sent in statements page #55969

Merged
merged 3 commits into from
Nov 2, 2020
Merged

sql/ui: surface network bytes sent in statements page #55969

merged 3 commits into from
Nov 2, 2020

Conversation

asubiotto
Copy link
Contributor

Closes #55331

This PR implements the conditional statistics collection and display described in phase 1 of the always-on EXPLAIN ANALYZE issue (#54556). The goal of this work is to provide a framework to surface more top-level statistics while always-on tracing is being worked on. The next steps here are to surface more top-level statistics we find interesting to the admin UI (e.g. query memory usage by @cathymw in #54340).

Once always-on tracing is completed, the code added to the conn executor to analyze trace data will be moved to recordStatement.

cc @jordanlewis

@asubiotto asubiotto requested review from RaduBerinde, cathymw and a team October 26, 2020 13:39
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@asubiotto asubiotto requested a review from dhartunian October 26, 2020 13:41
@asubiotto
Copy link
Contributor Author

This is what the statement page looks like when statement diagnostics is enabled:
image

@dhartunian I'd appreciate if you could take a look at the .tsx change to see if I'm following best practices with the filtering of this information.

Another question I'd like to figure out is whether we want to think more about returning these stats in StatementStatistics grouped by node. What gives me pause here is how to handle when the number of nodes in the physical plan changes. What should we do then?

@RaduBerinde are you willing to be the primary reviewer on this? I think we'll be working pretty closely on observability stuff so it makes the most sense to me.

@asubiotto
Copy link
Contributor Author

cc @yuzefovich / @cathymw on the first commit and fixing outbox bytes sent non-determinism.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. I have a few nits.

Also, I'm curious why we're showing a decimal point in the number of "rows read" in the image, that looks not pretty, cc @dhartunian

Reviewed 13 of 13 files at r1, 5 of 5 files at r2, 6 of 6 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto, @cathymw, @dhartunian, and @RaduBerinde)


pkg/sql/distsql_physical_planner.go, line 608 at r1 (raw file):

	noEvalSubqueries bool

	// If set, the flows for this physical plan will be passed to this function.

super nit: s/this physical/the physical/ since it's not clear what "this" refers to.


pkg/sql/plan.go, line 338 at r2 (raw file):

	// If we are collecting query diagnostics, flow diagrams are saved here.
	distSQLDiagrams []annotatedFlowDiagram
	// If we are collection query diagnostics, physical plans are saved here.

nit: s/collection/collecting/.


pkg/sql/traceanalyzer.go, line 22 at r1 (raw file):

	"github.com/cockroachdb/cockroach/pkg/util/tracing"
	"github.com/gogo/protobuf/types"
	"github.com/pkg/errors"

nit: we should use our cockroachdb/errors :)


pkg/sql/traceanalyzer.go, line 27 at r1 (raw file):

const nodeTagKey = "node"

type TraceAnalyzer struct {

Quick comment about the purpose of the struct would be helpful.


pkg/sql/traceanalyzer.go, line 43 at r1 (raw file):

func (a *TraceAnalyzer) populateIDMaps(flows map[roachpb.NodeID]*execinfrapb.FlowSpec) {
	for k := range a.processorIDMap {

nit: these two for loops could be unified into one similar to what you have in extractStatsByID.

Also, would do you think about actually merging populateIDMaps and extractStatsByID methods into one and define a separate helper method that deletes all entries from all 4 maps and calling this new helper method in the beginning of the merged method?


pkg/sql/traceanalyzer_test.go, line 54 at r1 (raw file):

							return nil
						}
						fmt.Println(stmt)

Is this a leftover?


pkg/sql/traceanalyzer_test.go, line 143 at r1 (raw file):

			{
				analyzer:      colexecTraceAnalyzer,
				expectedBytes: 1872,

Just curious: do you know why there is such a big difference in two numbers? Maybe add a quick comment if so.


pkg/sql/logictest/testdata/logic_test/dist_vectorize, line 59 at r3 (raw file):

SELECT url FROM [EXPLAIN ANALYZE SELECT * FROM kv JOIN kw ON kv.k = kw.k]
----
https://cockroachdb.github.io/distsqlplan/decode.html#eJzMWNFu2zYUfd9XEPepXelKpKQ0FlDA2ZYBLlK7i_OwrfCDYnG2YEfSKCpuEOTfB1kLEjs2RZmipbdYJnXOuT73hLyPkP27Ah8ml1eXv96gnK_Q79fjr-j75Z_fri6GI3Qxurj66-9L9O634eRm8sfVe_T_0p_Lhct79GU8HKHlGo1HaHn_cYk-o-X643IKGOIkZKPgjmXgfwcCGChgcACDCxg8mGJIeTJjWZbwYsnjZsMw_AG-jSGK01wUj6cYZgln4D-CiMSKgQ83we2KXbMgZNyyAUPIRBCtNjDL-0HKo7uAPwCGSRrEmY96VgF8G4jZgmUoyUWaCx8VG0Werl49KjgOx0hEd8xHdlZsehAsQ5wFoY9oH_0CGHiyfn5CYPqEodxdKnhmePuAFkG22OY2KNZPMWQimDPwyRM-TvLZjuT1W8kW6ahoelD0y3sSHjLOwt33fCiAlVbtqd9XxufsSxLFjFtkxzMr9o94NyAf3n_m0XxR_gkYxkV1BgQPKB44eOCql5P9YLNcREn8UtXtmr3Uw1GoRx7v07pX5ijpJalFvG2BCrS9erTdLdpE3bukul0t2rOcLnqXHCv6k0LDdlU0PSj6hA1LutqwVN0EVMH5Ts-qwbot59cQfa7g_K6KpgdFn9D5tKvOd9RN4Cg43-1ZXhdNQI4V3VdwfldF04OiT-h8p6vOd9VN4Co43-t10QLkWMlete97FkFBHCKCErFgvIvy6UH5J-wAt6sdUHFXvWZZmsQZ27mu7H-zXZSBhXNWFjdLcj5j33gy28CUH8ebfZvTZ8gyUX5Lyw_D-PmrTARC9bpjH5Rfh8KZLoXXhq6J3dfBrmgm-3xvN9WhR7R_Ho3aEM8oOKnhS2rGl3IKZ7oUNOT3dbCb8KWcHtH-eTRqs-PLpsHpLrj9GtzZ7ondzY50M9m2lG3G1a7ZwJcWT46tnfT68rUCX1O-4TSvANdK8yYixTMb-NLayLG1k17Fl3IKWoGvKd9wmleAa6V5E748qxP4TdJzqK1A71OLcS7HNntwl2Nr57hKy8op6MV5E86t4Gf2_H7eYpzLsc0e3OXY2jmu4ks5Bb04b8KXFfzMnt_73Y5zYngeI79YGZzENHLrNDit0S2O_uldaSTR7kzG8FBGB1wr1Rsxp8GRjW5x9I_wSuZsdTBDak1mWkj2N8Of5pK9Ef_K-Zk9zVeAG052ObjhuUwVunakK7VunalgG-FaZ3LYeLLUmRmeGNz0qL3W1NDU_5U3I6KWk3369NN_AQAA__-DJN1Z

nit: I see that we're not showing the number of bytes sent if that number is 0 although we do show some other fields, what do you think about showing zero bytes too? I'm not sure which way I'd prefer, but I wanted to point it out.

Copy link
Contributor Author

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @cathymw, @dhartunian, @RaduBerinde, and @yuzefovich)


pkg/sql/distsql_physical_planner.go, line 608 at r1 (raw file):

Previously, yuzefovich wrote…

super nit: s/this physical/the physical/ since it's not clear what "this" refers to.

Done.


pkg/sql/traceanalyzer.go, line 22 at r1 (raw file):

Previously, yuzefovich wrote…

nit: we should use our cockroachdb/errors :)

Done. Thanks. Didn't understand why go.mod was changing.


pkg/sql/traceanalyzer.go, line 27 at r1 (raw file):

Previously, yuzefovich wrote…

Quick comment about the purpose of the struct would be helpful.

Done.


pkg/sql/traceanalyzer.go, line 43 at r1 (raw file):

these two for loops could be unified into one similar to what you have in extractStatsByID.

I can't because the map elements are of different types (roachpb.NodeID vs [2]roachpb.NodeID)

Also, would do you think about actually merging populateIDMaps and extractStatsByID methods into one and define a separate helper method that deletes all entries from all 4 maps and calling this new helper method in the beginning of the merged method?

populateIDMaps and extractStatsByID feel like logically distinct operations so it seems nicer to separate them (I could have kept all that code in this Reset method). Is there a reason you'd like to see all of this in a single code block minus the map-resetting?


pkg/sql/traceanalyzer_test.go, line 54 at r1 (raw file):

Previously, yuzefovich wrote…

Is this a leftover?

Done.


pkg/sql/traceanalyzer_test.go, line 143 at r1 (raw file):

Previously, yuzefovich wrote…

Just curious: do you know why there is such a big difference in two numbers? Maybe add a quick comment if so.

I think it's just the overhead from arrow serialization as well as the fact that we try to encode ints in the least number of bytes possible in the rowexec case. Added a comment.


pkg/sql/logictest/testdata/logic_test/dist_vectorize, line 59 at r3 (raw file):

Previously, yuzefovich wrote…

nit: I see that we're not showing the number of bytes sent if that number is 0 although we do show some other fields, what do you think about showing zero bytes too? I'm not sure which way I'd prefer, but I wanted to point it out.

I'm not sure. @cathymw what do you think?

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Great stuff!

Until we have always-on tracing, we could collect this any time we collect the plan (see planTop.savePlanForStats)

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto, @cathymw, @dhartunian, @RaduBerinde, and @yuzefovich)


pkg/sql/conn_executor_exec.go, line 971 at r4 (raw file):

				planCtx.saveFlows = fn
			}
		} else {

[nit] can do else if planner.collectBundle here and remove the outer if.

I assume we don't care about support bundles when TestingSaveFlows is set.


pkg/sql/conn_executor_exec.go, line 399 at r6 (raw file):

				if err != nil && log.V(1) {
					log.Infof(ctx, "error calculating network bytes sent for stmt %s: %v", stmt, err)
					continue

We should continue regardless of log.V (maybe we can do log.VInfof() instead of checking, that has the benefit that it will always show up in traces)


pkg/sql/exec_util.go, line 853 at r4 (raw file):

	// TestingSaveFlows, if set, will be called with the given stmt. The resulting
	// function will be called with the physical plan of that statement's main
	// query (i.e. no subqueries).

Mention that nil can be returned (though maybe returning a no-op func would be cleaner).


pkg/sql/plan.go, line 270 at r5 (raw file):

var _ planNodeSpooled = &spoolNode{}

type queryType int

The name is very general, how about "planComponentType"? (these things are grouped in the planComponents struct). We could also move the defs next to that struct.


pkg/sql/plan.go, line 288 at r5 (raw file):

		return "postquery"
	default:
		return "<unknown>"

[nit] maybe remove the brackets, since this makes it into a filename


pkg/sql/traceanalyzer.go, line 33 at r4 (raw file):

//     analyzer.Reset(flows, trace)
//     bytesGroupedByNode, err := analyzer.GetNetworkBytesSent()
type TraceAnalyzer struct {

This feels generic enough that it doesn't need to live in sql. Is there an existing package we can use? Or we could make a new sql/execstats package.


pkg/sql/traceanalyzer.go, line 35 at r4 (raw file):

type TraceAnalyzer struct {
	// processorIDMap maps a processor ID to a node it was planned on.
	processorIDMap map[int]roachpb.NodeID

We should make separate types for the processor ID and stream ID (or use the ones from the specs), to make things more evident


pkg/sql/traceanalyzer.go, line 38 at r4 (raw file):

	// streamIDMap maps a stream ID to a given pair of nodes. The first node in
	// the array is the origin node, the second is the destination node.
	streamIDMap map[int][2]roachpb.NodeID

I know it's a bit of a pain, but making a struct with field names makes the code easier to read


pkg/sql/traceanalyzer.go, line 42 at r4 (raw file):

	// processorStats maps a processor ID to the stats associated with this
	// processor extracted from a trace.
	processorStats map[int]execinfrapb.DistSQLSpanStats

It feels a little strange that we have two maps with the same key, can we just put the NodeID and the stats in a struct and use a single map?


pkg/sql/traceanalyzer.go, line 45 at r4 (raw file):

	// streamStats maps a stream ID to the stats associated with this stream
	// extracted from a trace.
	streamStats map[int]execinfrapb.DistSQLSpanStats

ditto


pkg/sql/traceanalyzer_test.go, line 34 at r4 (raw file):

)

func TestTraceAnalyzer(t *testing.T) {

[nit] It would be nice to give a high level overview of what the test does


pkg/sql/logictest/testdata/logic_test/dist_vectorize, line 54 at r4 (raw file):

SELECT url FROM [EXPLAIN ANALYZE SELECT count(*) FROM kv]
----
https://cockroachdb.github.io/distsqlplan/decode.html#eJzMld1um0AQhe_7FKu5squ1MD92bK7ipq6E5EBqHPUnQhGBkYuCWbq7pIksv3sFpAq2HKCqGnPJsIc5h29gtiB-xmCCO1_ML1Yk4zH5tHQuyc3869ViZtlkZs8W377PSe-j5a7cz4s-eT4asCyRvff98vz9gwcUEhai7W9QgHkDKlDQgIIOFAygMAKPQspZgEIwnh_ZFgIrfARzSCFK0kzmZY9CwDiCuQUZyRjBhJV_F-MS_RC5MgQKIUo_ios29w_nKY82Pn8CCm7qJ8IkAyVv7GTSJDZLECjc-TL4gYKwTKZ5OX-IzNK4Usr9Wg6R0QZNMhS56EmiIBz90CTalHwACpz9-lNRwdtRKNXPtoX01wimuqPto83Wa45rXzKujPaTXTjX9up26Xxxe_32EfARg0xGLHlJ8ppP7VWfL_ayhPEQOYZ73rxdfRL1AJJ7fXlr2aveufp_kuh7SdT2w6Q2D5OiDRS9O-PUEK4CYXzScdLaQ9BaQNAHitEdCA3hKhDOTgpBbw9BbwHBGBT_qI5AaAhXgTA5KQSjPQSjBYTRoDsIGqJVEEw7s9uO-FyiSFki8GDHHX_yMN99GK6xXJSCZTzAK86Cok156RS6ohCikOVdtbywkvJWbrAqVmvF2p5YPRRr9Z0bWuu1aqNebPyN7-ItFi_038dWHWlH53bf3qjW3rg-27jb2c5q7U3qs026nW1aP9HDho-p_lN8-3Te7t3vAAAA__8nPFQM

What caused this change? Why weren't we showing the bytes read before?

Copy link
Contributor

@cathymw cathymw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto, @cathymw, @dhartunian, @RaduBerinde, and @yuzefovich)


pkg/sql/logictest/testdata/logic_test/dist_vectorize, line 59 at r3 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

I'm not sure. @cathymw what do you think?

We also don't show rows read if it's 0. I don't have a strong preference either. It might be nice to be consistent. I'm not sure if it would be confusing for people to not see all the stats, especially if other stats still get displayed when they have zero values. I don't know if it would look too crowded if the diagrams get very large and complicated though.


pkg/sql/logictest/testdata/logic_test/dist_vectorize, line 54 at r4 (raw file):

Previously, RaduBerinde wrote…

What caused this change? Why weren't we showing the bytes read before?

I think it's because when bytesRead == 0, they aren't displayed. And then, prior to these changes, for tests we would set bytes read to zero due to non-determinism. But now, Alfonso fixed the non-determinism, so the value no longer gets zeroed out and shows up on the diagram now

Copy link
Contributor Author

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR. Addressed all comments but I'm still working on a test flake so this is not ready to merge yet.

Until we have always-on tracing, we could collect this any time we collect the plan (see planTop.savePlanForStats)

Do we enable tracing in those cases? It doesn't seem like it but I may be missing something.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @cathymw, @dhartunian, @jordanlewis, @RaduBerinde, and @yuzefovich)


pkg/sql/conn_executor_exec.go, line 399 at r6 (raw file):

Previously, RaduBerinde wrote…

We should continue regardless of log.V (maybe we can do log.VInfof() instead of checking, that has the benefit that it will always show up in traces)

Oops, good catch.


pkg/sql/plan.go, line 270 at r5 (raw file):

Previously, RaduBerinde wrote…

The name is very general, how about "planComponentType"? (these things are grouped in the planComponents struct). We could also move the defs next to that struct.

Done.


pkg/sql/traceanalyzer.go, line 43 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

these two for loops could be unified into one similar to what you have in extractStatsByID.

I can't because the map elements are of different types (roachpb.NodeID vs [2]roachpb.NodeID)

Also, would do you think about actually merging populateIDMaps and extractStatsByID methods into one and define a separate helper method that deletes all entries from all 4 maps and calling this new helper method in the beginning of the merged method?

populateIDMaps and extractStatsByID feel like logically distinct operations so it seems nicer to separate them (I could have kept all that code in this Reset method). Is there a reason you'd like to see all of this in a single code block minus the map-resetting?

I ended up putting everything together in Reset.


pkg/sql/traceanalyzer.go, line 33 at r4 (raw file):

Previously, RaduBerinde wrote…

This feels generic enough that it doesn't need to live in sql. Is there an existing package we can use? Or we could make a new sql/execstats package.

That was my initial approach, but an import cycle occurs in the test because serverutils depends on sql which in turn depends on execstats. It's unfortunate that a test causes this. Do you see any way to break the cycle? Another option is to keep the test in sql but that seems unfortunate.


pkg/sql/traceanalyzer.go, line 42 at r4 (raw file):

Previously, RaduBerinde wrote…

It feels a little strange that we have two maps with the same key, can we just put the NodeID and the stats in a struct and use a single map?

Done. Good call.


pkg/sql/logictest/testdata/logic_test/dist_vectorize, line 59 at r3 (raw file):

Previously, cathymw wrote…

We also don't show rows read if it's 0. I don't have a strong preference either. It might be nice to be consistent. I'm not sure if it would be confusing for people to not see all the stats, especially if other stats still get displayed when they have zero values. I don't know if it would look too crowded if the diagrams get very large and complicated though.

@jordanlewis was surprised by this behavior as well. I'm happy to change it but will leave it for another PR.


pkg/sql/logictest/testdata/logic_test/dist_vectorize, line 54 at r4 (raw file):

Previously, cathymw wrote…

I think it's because when bytesRead == 0, they aren't displayed. And then, prior to these changes, for tests we would set bytes read to zero due to non-determinism. But now, Alfonso fixed the non-determinism, so the value no longer gets zeroed out and shows up on the diagram now

That's right. The non-determinism was due to including metadata size in bytes sent. The commit changes the bytes sent calculation to only include the row bytes in tests (i.e. when the DeterministicStats flag is set) for determinism.

@asubiotto
Copy link
Contributor Author

I think I figured the test issue out. It seems FlowSpecs are not safe for use after saveFlows is called as we currently do in the test since they're released to the sync pool. I have to step back from this PR for today but I'll have to think about what the best way forward should be. cc @yuzefovich if you've got any suggestions

@yuzefovich
Copy link
Member

Oh yeah, we do pool those objects. I guess we'll have to release the flow specs back to the pool later, after the trace has been analyzed.

Copy link
Collaborator

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm curious why we're showing a decimal point in the number of "rows read" in the image, that looks not pretty, cc @dhartunian

@yuzefovich unfortunately the stats table component applies a uniform number formatting to all rows. Once the statements page is embedded into CC, we'll devote more time and attention to these visual nits.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto, @cathymw, @dhartunian, @jordanlewis, @RaduBerinde, and @yuzefovich)


pkg/ui/src/views/statements/statementDetails.tsx, line 508 at r9 (raw file):

                },
              ].filter(function (r) {
                console.log("filtering", r.value);

nit: assume this will be removed before merge.


pkg/ui/src/views/statements/statementDetails.tsx, line 511 at r9 (raw file):

                if (r.name === "Network Bytes Sent") {
                  if (r.value.mean === 0) {
                    // Omit if empty.

Curious if there's any difference in how the rows_read and bytes_read fields are generated and this one, why should we omit showing this when the number is zero?

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we enable tracing in those cases? It doesn't seem like it but I may be missing something.

No, we would need to do that.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto, @jordanlewis, and @yuzefovich)


pkg/sql/traceanalyzer.go, line 33 at r4 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

That was my initial approach, but an import cycle occurs in the test because serverutils depends on sql which in turn depends on execstats. It's unfortunate that a test causes this. Do you see any way to break the cycle? Another option is to keep the test in sql but that seems unfortunate.

serverutils does not depend on sql, it exists exactly to allow tests to create test servers and clusters without directly depending on server and sql. Plus since you are only testing external interfaces, the test would be in a something_test package where it can depend on anything.

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto, @jordanlewis, and @yuzefovich)


pkg/sql/traceanalyzer_test.go, line 14 at r7 (raw file):

import (
	context "context"

[nit] redundant alias


pkg/sql/traceanalyzer_test.go, line 152 at r7 (raw file):

			{
				analyzer: colexecTraceAnalyzer,
				// Note that the expectedBytes in the colexec case is larger than the

It's strange that we'll need to update these tests whenever some over the wire format changes. Maybe we should just check that the value is in a reasonable range (e.g. 32 to 4096)

Copy link
Contributor Author

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we would need to do that.

I like the idea but isn't the overhead concerning? Will leave this for later in any case

I changed thesaveFlow semantics to disallow saving flows past the function's lifetime since we would otherwise need to defer releases up in conn executor space, which seems like a big no-no. The result is that we create analyzers in saveFlow, have them transform the flow into the maps they can use, and attach those analyzers to a plan that are then reused by the conn executor.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @dhartunian, @jordanlewis, @RaduBerinde, and @yuzefovich)


pkg/sql/traceanalyzer.go, line 33 at r4 (raw file):

Previously, RaduBerinde wrote…

serverutils does not depend on sql, it exists exactly to allow tests to create test servers and clusters without directly depending on server and sql. Plus since you are only testing external interfaces, the test would be in a something_test package where it can depend on anything.

Not directly, but to use the testcluster you need to initialize the factory in TestMain:

func TestMain(m *testing.M) {
	serverutils.InitTestClusterFactory(testcluster.TestClusterFactory)
}

This in turn depends on server and sql:

package github.com/cockroachdb/cockroach/pkg/sql/execstats (test)
	imports github.com/cockroachdb/cockroach/pkg/testutils/testcluster
	imports github.com/cockroachdb/cockroach/pkg/server
	imports github.com/cockroachdb/cockroach/pkg/sql
	imports github.com/cockroachdb/cockroach/pkg/sql/execstats: import cycle not allowed in test

Am I missing something? This is with the test already in the *_test package.


pkg/ui/src/views/statements/statementDetails.tsx, line 511 at r9 (raw file):

Previously, dhartunian (David Hartunian) wrote…

Curious if there's any difference in how the rows_read and bytes_read fields are generated and this one, why should we omit showing this when the number is zero?

There is a difference. The network bytes sent is only collected when a statement diagnostics bundle is requested because of the current overhead of collecting that information. The idea is that by the end of this release, we will get to a point where we can display this stat unconditionally. Until then, it would be misleading to display "0" for this stat if it hasn't been collected.

@asubiotto
Copy link
Contributor Author

asubiotto commented Oct 29, 2020

I'm still on the fence re adding the total network bytes sent for the query. Is that data useful? Would it be better to send back some 2D slice where each cell is the bytes sent from node i -> node j? We could possibly enhance such a data structure in the future with network latency and number of messages sent. We could potentially just deprecate the network bytes sent but it's probably worth thinking through what the best way forward here is.

One problem with the per-node approach is that physical plans can change and nodes might not be part of a physical plan any longer. Is simply overwriting the previous data acceptable? I'm not sure anything else makes sense.

In the context of this PR. Maybe I can merge this without the last commit and figure the admin-UI related plans later.

@RaduBerinde
Copy link
Member


pkg/sql/traceanalyzer.go, line 33 at r4 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Not directly, but to use the testcluster you need to initialize the factory in TestMain:

func TestMain(m *testing.M) {
	serverutils.InitTestClusterFactory(testcluster.TestClusterFactory)
}

This in turn depends on server and sql:

package github.com/cockroachdb/cockroach/pkg/sql/execstats (test)
	imports github.com/cockroachdb/cockroach/pkg/testutils/testcluster
	imports github.com/cockroachdb/cockroach/pkg/server
	imports github.com/cockroachdb/cockroach/pkg/sql
	imports github.com/cockroachdb/cockroach/pkg/sql/execstats: import cycle not allowed in test

Am I missing something? This is with the test already in the *_test package.

The trick is that TestMain should be in _test package (which can depend on anything).

Copy link
Collaborator

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: from admin ui perspective, thanks for answering my question

I'm still on the fence re adding the total network bytes sent for the query. Is that data useful? Would it be better to send back some 2D slice where each cell is the bytes sent from node i -> node j? We could possibly enhance such a data structure in the future with network latency and number of messages sent. We could potentially just deprecate the network bytes sent but it's probably worth thinking through what the best way forward here is.

My take is to keep the data simpler in the short run. We already have more data in the Admin UI than we can properly visualize and make actionable so I hesitate to add extra detail unless we have a clear way to signal when it matters to someone.

Reviewed 3 of 13 files at r1, 1 of 18 files at r7, 19 of 19 files at r10, 6 of 6 files at r11.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto, @dhartunian, @jordanlewis, and @yuzefovich)

@asubiotto asubiotto requested a review from a team as a code owner October 30, 2020 14:16
@asubiotto
Copy link
Contributor Author

Thanks @RaduBerinde, that worked. Put the TraceAnalyzer in its own execstats package.

@asubiotto
Copy link
Contributor Author

I think I've convinced myself that it's useful to have this statement-level statistic in any case. It keeps things simple as @dhartunian mentions, and is not incompatible with statistics in the context of a physical plan which we can return in addition to statement-level stats. It probably makes sense attach these to an EXPLAIN ANALYZE representation as extra metadata. Will bors on green.

This commit adds TraceAnalyzer, which calculates general stats from a physical
plan and an accompanying statement trace. This commit adds an example
calculation of the number of netowrk bytes sent by each node in the plan.

Additionally, this commit fixes non-determinism in outbox network statistics
caused by variable metadata in order to make testing more useful.

Release note: None
…l plans

This commit factors out some repeated code and adds a query type to flow
diagrams, which was a TODO when collecting statement bundles in order to be
able to differentiate between main queries, postqueries, and subqueries.

Release note: None
This commit uses a TraceAnalyzer when statement diagnostics are enabled to
populate BytesSentOverNetwork on a per-statement basis. This data is displayed
in the corresponding statement's page if collected.

This work is a precursor to always-on tracing. Once always-on tracing is
enabled, these types of statement statistics calculated from the trace will
always be displayed in the admin UI.

Release note (admin ui change): If statement diagnostics are enabled for a
statement, the bytes sent over the network will be shown in the statement's
page.
@asubiotto
Copy link
Contributor Author

Latest CI failure seems like an infra issue. TFTRs

bors r=RaduBerinde,dhartunian

@craig
Copy link
Contributor

craig bot commented Nov 2, 2020

Build succeeded:

@craig craig bot merged commit 2df27d8 into cockroachdb:master Nov 2, 2020
@asubiotto asubiotto deleted the expl branch November 2, 2020 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: surface EXPLAIN ANALYZE stats when a statement diagnostics bundle is requested
6 participants