-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: move parallelize scans control in the execbuilder #51121
Conversation
f45bbc0
to
f859e9c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 20 of 20 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @RaduBerinde)
pkg/sql/distsql_spec_exec_factory.go, line 207 at r1 (raw file):
} // TODO(yuzefovich): scanNode adds "parallel" attribute in walk.go when
Do you think this TODO is not necessary because you'll be refactoring EXPLAIN (PLAN) soon?
pkg/sql/colfetcher/colbatch_scan.go, line 157 at r1 (raw file):
rf: &fetcher, limitHint: limitHint, parallelize: spec.Parallelize && limitHint == 0,
Hm, is limitHint == 0
part is necessary? I'd expected the execbuilder to know about the limit hint and set spec.Parallelize
accordingly. Or is it for backwards-compatibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @yuzefovich)
pkg/sql/distsql_spec_exec_factory.go, line 207 at r1 (raw file):
Previously, yuzefovich wrote…
Do you think this TODO is not necessary because you'll be refactoring EXPLAIN (PLAN) soon?
I may be misunderstanding the TODO, but now this information is plumbed in the parallelize field. In any case, I am indeed working on EXPLAIN (this change is fallout from that work), but it seems like it's a pretty big project :)
pkg/sql/colfetcher/colbatch_scan.go, line 157 at r1 (raw file):
Previously, yuzefovich wrote…
Hm, is
limitHint == 0
part is necessary? I'd expected the execbuilder to know about the limit hint and setspec.Parallelize
accordingly. Or is it for backwards-compatibility?
It shouldn't be necessary.. I put it there just in case the upper layers messed up. Maybe I should remove it.. In principle maybe the upper layers could decide it's better to parallelized even if there's a limit (e.g. there are 100 single-key spans and the limit it 90).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @RaduBerinde)
pkg/sql/distsql_spec_exec_factory.go, line 207 at r1 (raw file):
Previously, RaduBerinde wrote…
I may be misunderstanding the TODO, but now this information is plumbed in the parallelize field. In any case, I am indeed working on EXPLAIN (this change is fallout from that work), but it seems like it's a pretty big project :)
This TODO comes from the fact that with the new factory we don't have a planNode
tree, and "parallel" attribute could added in EXPLAIN (PLAN)
because there is a scanNode
with parallelize==true
which is not the case in this factory. I'd keep this TODO as a reminder that the new factory needs to somehow propagate information about parallize
flag to EXPLAIN (PLAN)
(refactoring of each is, indeed, quite a big lift).
pkg/sql/colfetcher/colbatch_scan.go, line 157 at r1 (raw file):
Previously, RaduBerinde wrote…
It shouldn't be necessary.. I put it there just in case the upper layers messed up. Maybe I should remove it.. In principle maybe the upper layers could decide it's better to parallelized even if there's a limit (e.g. there are 100 single-key spans and the limit it 90).
I see, it might be worth keeping it but also please add a comment for why we still check the limit hint.
f859e9c
to
518455e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @yuzefovich)
pkg/sql/distsql_spec_exec_factory.go, line 207 at r1 (raw file):
Previously, yuzefovich wrote…
This TODO comes from the fact that with the new factory we don't have a
planNode
tree, and "parallel" attribute could added inEXPLAIN (PLAN)
because there is ascanNode
withparallelize==true
which is not the case in this factory. I'd keep this TODO as a reminder that the new factory needs to somehow propagate information aboutparallize
flag toEXPLAIN (PLAN)
(refactoring of each is, indeed, quite a big lift).
But (after my change) how is parallelize
different than any other property passed to ConstructScan
? In principle, we could check the field from the TableReader specs if we wanted to extract this information.
pkg/sql/colfetcher/colbatch_scan.go, line 157 at r1 (raw file):
Previously, yuzefovich wrote…
I see, it might be worth keeping it but also please add a comment for why we still check the limit hint.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jordanlewis and @RaduBerinde)
pkg/sql/distsql_spec_exec_factory.go, line 207 at r1 (raw file):
Previously, RaduBerinde wrote…
But (after my change) how is
parallelize
different than any other property passed toConstructScan
? In principle, we could check the field from the TableReader specs if we wanted to extract this information.
Good point, let's remove it then.
518455e
to
093aecc
Compare
Parallel scans refers to disabling scan batch limits, which allows the distsender to issue requests to multiple ranges in parallel. This is only safe to use when there is a known upper bound for the number of results. Currently we plumb maxResults to the scanNode and TableReader, and each execution component runs similar logic to decide whether to parallelize. This change cleans this up by centralizing this decision inside the execbuilder. In the future, we may want the coster to be aware of this parallelization as well. For simplicity, we drop the cluster setting that controls this (it was added for fear of problems but it has been on by default for a long time). Release note: None
093aecc
to
256ba77
Compare
bors r+ |
Build succeeded |
Parallel scans refers to disabling scan batch limits, which allows the
distsender to issue requests to multiple ranges in parallel. This is
only safe to use when there is a known upper bound for the number of
results.
Currently we plumb maxResults to the scanNode and TableReader, and
each execution component runs similar logic to decide whether to
parallelize.
This change cleans this up by centralizing this decision inside the
execbuilder. In the future, we may want the coster to be aware of this
parallelization as well.
For simplicity, we drop the cluster setting that controls this (it was
added for fear of problems but it has been on by default for a long
time).
Release note: None