sql: move parallelize scans control in the execbuilder #51121

RaduBerinde · 2020-07-08T05:07:15Z

Parallel scans refers to disabling scan batch limits, which allows the
distsender to issue requests to multiple ranges in parallel. This is
only safe to use when there is a known upper bound for the number of
results.

Currently we plumb maxResults to the scanNode and TableReader, and
each execution component runs similar logic to decide whether to
parallelize.

This change cleans this up by centralizing this decision inside the
execbuilder. In the future, we may want the coster to be aware of this
parallelization as well.

For simplicity, we drop the cluster setting that controls this (it was
added for fear of problems but it has been on by default for a long
time).

Release note: None

cockroach-teamcity · 2020-07-08T05:07:23Z

This change is

yuzefovich

Reviewed 20 of 20 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @RaduBerinde)

pkg/sql/distsql_spec_exec_factory.go, line 207 at r1 (raw file):

	}

	// TODO(yuzefovich): scanNode adds "parallel" attribute in walk.go when

Do you think this TODO is not necessary because you'll be refactoring EXPLAIN (PLAN) soon?

pkg/sql/colfetcher/colbatch_scan.go, line 157 at r1 (raw file):

		rf:          &fetcher,
		limitHint:   limitHint,
		parallelize: spec.Parallelize && limitHint == 0,

Hm, is limitHint == 0 part is necessary? I'd expected the execbuilder to know about the limit hint and set spec.Parallelize accordingly. Or is it for backwards-compatibility?

RaduBerinde

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @yuzefovich)

pkg/sql/distsql_spec_exec_factory.go, line 207 at r1 (raw file):

Previously, yuzefovich wrote…

Do you think this TODO is not necessary because you'll be refactoring EXPLAIN (PLAN) soon?

I may be misunderstanding the TODO, but now this information is plumbed in the parallelize field. In any case, I am indeed working on EXPLAIN (this change is fallout from that work), but it seems like it's a pretty big project :)

pkg/sql/colfetcher/colbatch_scan.go, line 157 at r1 (raw file):

Previously, yuzefovich wrote…

Hm, is limitHint == 0 part is necessary? I'd expected the execbuilder to know about the limit hint and set spec.Parallelize accordingly. Or is it for backwards-compatibility?

It shouldn't be necessary.. I put it there just in case the upper layers messed up. Maybe I should remove it.. In principle maybe the upper layers could decide it's better to parallelized even if there's a limit (e.g. there are 100 single-key spans and the limit it 90).

yuzefovich

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @RaduBerinde)

pkg/sql/distsql_spec_exec_factory.go, line 207 at r1 (raw file):

Previously, RaduBerinde wrote…

I may be misunderstanding the TODO, but now this information is plumbed in the parallelize field. In any case, I am indeed working on EXPLAIN (this change is fallout from that work), but it seems like it's a pretty big project :)

This TODO comes from the fact that with the new factory we don't have a planNode tree, and "parallel" attribute could added in EXPLAIN (PLAN) because there is a scanNode with parallelize==true which is not the case in this factory. I'd keep this TODO as a reminder that the new factory needs to somehow propagate information about parallize flag to EXPLAIN (PLAN) (refactoring of each is, indeed, quite a big lift).

pkg/sql/colfetcher/colbatch_scan.go, line 157 at r1 (raw file):

Previously, RaduBerinde wrote…

It shouldn't be necessary.. I put it there just in case the upper layers messed up. Maybe I should remove it.. In principle maybe the upper layers could decide it's better to parallelized even if there's a limit (e.g. there are 100 single-key spans and the limit it 90).

I see, it might be worth keeping it but also please add a comment for why we still check the limit hint.

RaduBerinde

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @yuzefovich)

pkg/sql/distsql_spec_exec_factory.go, line 207 at r1 (raw file):

Previously, yuzefovich wrote…

This TODO comes from the fact that with the new factory we don't have a planNode tree, and "parallel" attribute could added in EXPLAIN (PLAN) because there is a scanNode with parallelize==true which is not the case in this factory. I'd keep this TODO as a reminder that the new factory needs to somehow propagate information about parallize flag to EXPLAIN (PLAN) (refactoring of each is, indeed, quite a big lift).

But (after my change) how is parallelize different than any other property passed to ConstructScan? In principle, we could check the field from the TableReader specs if we wanted to extract this information.

pkg/sql/colfetcher/colbatch_scan.go, line 157 at r1 (raw file):

Previously, yuzefovich wrote…

I see, it might be worth keeping it but also please add a comment for why we still check the limit hint.

Done.

yuzefovich

Reviewed 2 of 2 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jordanlewis and @RaduBerinde)

pkg/sql/distsql_spec_exec_factory.go, line 207 at r1 (raw file):

Previously, RaduBerinde wrote…

But (after my change) how is parallelize different than any other property passed to ConstructScan? In principle, we could check the field from the TableReader specs if we wanted to extract this information.

Good point, let's remove it then.

Parallel scans refers to disabling scan batch limits, which allows the distsender to issue requests to multiple ranges in parallel. This is only safe to use when there is a known upper bound for the number of results. Currently we plumb maxResults to the scanNode and TableReader, and each execution component runs similar logic to decide whether to parallelize. This change cleans this up by centralizing this decision inside the execbuilder. In the future, we may want the coster to be aware of this parallelization as well. For simplicity, we drop the cluster setting that controls this (it was added for fear of problems but it has been on by default for a long time). Release note: None

RaduBerinde · 2020-07-09T16:59:16Z

bors r+

craig · 2020-07-09T17:47:20Z

Build succeeded

GitHub CI (Cockroach)

RaduBerinde requested review from jordanlewis and yuzefovich July 8, 2020 05:07

RaduBerinde requested a review from a team as a code owner July 8, 2020 05:07

RaduBerinde force-pushed the plumb-parallel branch from f45bbc0 to f859e9c Compare July 8, 2020 16:05

yuzefovich reviewed Jul 8, 2020

View reviewed changes

RaduBerinde commented Jul 8, 2020

View reviewed changes

yuzefovich approved these changes Jul 8, 2020

View reviewed changes

RaduBerinde force-pushed the plumb-parallel branch from f859e9c to 518455e Compare July 8, 2020 22:18

RaduBerinde commented Jul 8, 2020

View reviewed changes

yuzefovich reviewed Jul 8, 2020

View reviewed changes

RaduBerinde force-pushed the plumb-parallel branch from 518455e to 093aecc Compare July 9, 2020 15:26

RaduBerinde force-pushed the plumb-parallel branch from 093aecc to 256ba77 Compare July 9, 2020 16:59

craig bot merged commit ba2deac into cockroachdb:master Jul 9, 2020

RaduBerinde deleted the plumb-parallel branch July 9, 2020 17:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: move parallelize scans control in the execbuilder #51121

sql: move parallelize scans control in the execbuilder #51121

RaduBerinde commented Jul 8, 2020

cockroach-teamcity commented Jul 8, 2020

yuzefovich left a comment

RaduBerinde left a comment

yuzefovich left a comment

RaduBerinde left a comment

yuzefovich left a comment

RaduBerinde commented Jul 9, 2020

craig bot commented Jul 9, 2020

sql: move parallelize scans control in the execbuilder #51121

sql: move parallelize scans control in the execbuilder #51121

Conversation

RaduBerinde commented Jul 8, 2020

cockroach-teamcity commented Jul 8, 2020

yuzefovich left a comment

Choose a reason for hiding this comment

RaduBerinde left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

RaduBerinde left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

RaduBerinde commented Jul 9, 2020

craig bot commented Jul 9, 2020

Build succeeded