Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: move parallelize scans control in the execbuilder #51121

Merged
merged 1 commit into from
Jul 9, 2020

Conversation

RaduBerinde
Copy link
Member

Parallel scans refers to disabling scan batch limits, which allows the
distsender to issue requests to multiple ranges in parallel. This is
only safe to use when there is a known upper bound for the number of
results.

Currently we plumb maxResults to the scanNode and TableReader, and
each execution component runs similar logic to decide whether to
parallelize.

This change cleans this up by centralizing this decision inside the
execbuilder. In the future, we may want the coster to be aware of this
parallelization as well.

For simplicity, we drop the cluster setting that controls this (it was
added for fear of problems but it has been on by default for a long
time).

Release note: None

@RaduBerinde RaduBerinde requested a review from a team as a code owner July 8, 2020 05:07
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 20 of 20 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @RaduBerinde)


pkg/sql/distsql_spec_exec_factory.go, line 207 at r1 (raw file):

	}

	// TODO(yuzefovich): scanNode adds "parallel" attribute in walk.go when

Do you think this TODO is not necessary because you'll be refactoring EXPLAIN (PLAN) soon?


pkg/sql/colfetcher/colbatch_scan.go, line 157 at r1 (raw file):

		rf:          &fetcher,
		limitHint:   limitHint,
		parallelize: spec.Parallelize && limitHint == 0,

Hm, is limitHint == 0 part is necessary? I'd expected the execbuilder to know about the limit hint and set spec.Parallelize accordingly. Or is it for backwards-compatibility?

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @yuzefovich)


pkg/sql/distsql_spec_exec_factory.go, line 207 at r1 (raw file):

Previously, yuzefovich wrote…

Do you think this TODO is not necessary because you'll be refactoring EXPLAIN (PLAN) soon?

I may be misunderstanding the TODO, but now this information is plumbed in the parallelize field. In any case, I am indeed working on EXPLAIN (this change is fallout from that work), but it seems like it's a pretty big project :)


pkg/sql/colfetcher/colbatch_scan.go, line 157 at r1 (raw file):

Previously, yuzefovich wrote…

Hm, is limitHint == 0 part is necessary? I'd expected the execbuilder to know about the limit hint and set spec.Parallelize accordingly. Or is it for backwards-compatibility?

It shouldn't be necessary.. I put it there just in case the upper layers messed up. Maybe I should remove it.. In principle maybe the upper layers could decide it's better to parallelized even if there's a limit (e.g. there are 100 single-key spans and the limit it 90).

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @RaduBerinde)


pkg/sql/distsql_spec_exec_factory.go, line 207 at r1 (raw file):

Previously, RaduBerinde wrote…

I may be misunderstanding the TODO, but now this information is plumbed in the parallelize field. In any case, I am indeed working on EXPLAIN (this change is fallout from that work), but it seems like it's a pretty big project :)

This TODO comes from the fact that with the new factory we don't have a planNode tree, and "parallel" attribute could added in EXPLAIN (PLAN) because there is a scanNode with parallelize==true which is not the case in this factory. I'd keep this TODO as a reminder that the new factory needs to somehow propagate information about parallize flag to EXPLAIN (PLAN) (refactoring of each is, indeed, quite a big lift).


pkg/sql/colfetcher/colbatch_scan.go, line 157 at r1 (raw file):

Previously, RaduBerinde wrote…

It shouldn't be necessary.. I put it there just in case the upper layers messed up. Maybe I should remove it.. In principle maybe the upper layers could decide it's better to parallelized even if there's a limit (e.g. there are 100 single-key spans and the limit it 90).

I see, it might be worth keeping it but also please add a comment for why we still check the limit hint.

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @yuzefovich)


pkg/sql/distsql_spec_exec_factory.go, line 207 at r1 (raw file):

Previously, yuzefovich wrote…

This TODO comes from the fact that with the new factory we don't have a planNode tree, and "parallel" attribute could added in EXPLAIN (PLAN) because there is a scanNode with parallelize==true which is not the case in this factory. I'd keep this TODO as a reminder that the new factory needs to somehow propagate information about parallize flag to EXPLAIN (PLAN) (refactoring of each is, indeed, quite a big lift).

But (after my change) how is parallelize different than any other property passed to ConstructScan? In principle, we could check the field from the TableReader specs if we wanted to extract this information.


pkg/sql/colfetcher/colbatch_scan.go, line 157 at r1 (raw file):

Previously, yuzefovich wrote…

I see, it might be worth keeping it but also please add a comment for why we still check the limit hint.

Done.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jordanlewis and @RaduBerinde)


pkg/sql/distsql_spec_exec_factory.go, line 207 at r1 (raw file):

Previously, RaduBerinde wrote…

But (after my change) how is parallelize different than any other property passed to ConstructScan? In principle, we could check the field from the TableReader specs if we wanted to extract this information.

Good point, let's remove it then.

Parallel scans refers to disabling scan batch limits, which allows the
distsender to issue requests to multiple ranges in parallel. This is
only safe to use when there is a known upper bound for the number of
results.

Currently we plumb maxResults to the scanNode and TableReader, and
each execution component runs similar logic to decide whether to
parallelize.

This change cleans this up by centralizing this decision inside the
execbuilder. In the future, we may want the coster to be aware of this
parallelization as well.

For simplicity, we drop the cluster setting that controls this (it was
added for fear of problems but it has been on by default for a long
time).

Release note: None
@RaduBerinde
Copy link
Member Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jul 9, 2020

Build succeeded

@craig craig bot merged commit ba2deac into cockroachdb:master Jul 9, 2020
@RaduBerinde RaduBerinde deleted the plumb-parallel branch July 9, 2020 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants