sql: change physical planning heuristics a bit to prefer local execution #68524

yuzefovich · 2021-08-06T03:04:56Z

This commit changes two parts of the physical planner heuristics:

we now say that the lookup join "can be distributed" rather than
"should be distributed"
for top K sort we also say that it "can be" rather than "should be"
distributed.

I'm not certain whether both of these changes are always beneficial, but
here is some justification.

The change to the lookup join heuristic will make it so that the
distribution of the join reader stage is determined by other stages of
the physical plan in distsql=auto mode. Consider an example when the
input to the lookup join is the table reader that scans only a handful
of rows. Previously, because of the "should distribute" heuristic, such
a plan would be "distributed" meaning we would plan a single table
reader on the leaseholder for the relevant range (most likely a remote
node from the perspective of the gateway node for the query); this, in
turn, would force the planning of the join reader on the same node, and
all consequent stages - if any - too. Such a decision can create
a hotspot if that particular range is queried often (think append-only
access pattern where the latest data is accessed most frequently).

With this change in such a scenario we will get more even compute
utilization across the cluster because the flow will be fully planned on
the gateway (which assumed to be chosen randomly by a load balancer),
and the lookup join will be performed from the gateway (we'll still need
to perform a remote read from the leaseholder of that single range).

The change to the top K sort heuristic seems less controversial to me,
yet I don't have a good justification. My feeling is that usually the
value of K is small, so it's ok if we don't "force" ourselves to
distribute the sort operation if the physical plan otherwise isn't
calling for it.

Overall, the choice of making changes to these two heuristics isn't very
principled and is driven by a single query from one of our largest
customers which happened to hit the hot spot scenario as described
above. In their case, they have append-like workload that is constantly
updating a single range. Eventually that range is split automatically,
but both new ranges stay on the same node. The latest data is accessed
far more frequently than any other data in the table, yet according to
the KV heuristics the ranges aren't being reallocated because the scans
hitting the hot ranges aren't exceeding the threshold. What isn't
accounted for is the fact that other parts of the flow are far more
compute-intensive, so this change attempts to alleviate such a hot node
scenario.

Release note (sql change): Some queries with lookup joins and/or top
K sorts are now more likely to be executed in "local" manner with
distsql=auto session variable.

cockroach-teamcity · 2021-08-06T03:05:02Z

This change is

RaduBerinde · 2021-08-06T04:32:46Z

Seems reasonable to me, but can you include a bit more context about the case that inspired this?

yuzefovich · 2021-08-06T05:02:09Z

Thanks for taking a look. I provided some more context on where this change is coming from. However, I'm not certain that we want to merge it.

cucaroach

Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @yuzefovich)

pkg/sql/distsql_physical_planner.go, line 460 at r1 (raw file):

			return cannotDistribute, err
		}
		return rec.compose(canDistribute), nil

So if all the node are "can" and none are "should" we'll always plan it locally?

cucaroach

Seems like a mechanism for user to set the distRecommendation on a per-query basis might be useful, have we had issues like this in the past?

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @yuzefovich)

yuzefovich

I'm not aware of any. Nathan mentioned a similar idea yesterday of introducing a "distribution hint" (similar to index and join hints), and I'll look into it today.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @cucaroach and @RaduBerinde)

pkg/sql/distsql_physical_planner.go, line 460 at r1 (raw file):

Previously, cucaroach (Tommy Reilly) wrote…

So if all the node are "can" and none are "should" we'll always plan it locally?

Yep, in distsql=auto mode. With distsql=on both "can" and "should" are distributed.

This commit changes two parts of the physical planner heuristics: - we now say that the lookup join "can be distributed" rather than "should be distributed" - for top K sort we also say that it "can be" rather than "should be" distributed. I'm not certain whether both of these changes are always beneficial, but here is some justification. The change to the lookup join heuristic will make it so that the distribution of the join reader stage is determined by other stages of the physical plan in `distsql=auto` mode. Consider an example when the input to the lookup join is the table reader that scans only a handful of rows. Previously, because of the "should distribute" heuristic, such a plan would be "distributed" meaning we would plan a single table reader on the leaseholder for the relevant range (most likely a remote node from the perspective of the gateway node for the query); this, in turn, would force the planning of the join reader on the same node, and all consequent stages - if any - too. Such a decision can create a hotspot if that particular range is queried often (think append-only access pattern where the latest data is accessed most frequently). With this change in such a scenario we will get more even compute utilization across the cluster because the flow will be fully planned on the gateway (which assumed to be chosen randomly by a load balancer), and the lookup join will be performed from the gateway (we'll still need to perform a remote read from the leaseholder of that single range). The change to the top K sort heuristic seems less controversial to me, yet I don't have a good justification. My feeling is that usually the value of K is small, so it's ok if we don't "force" ourselves to distribute the sort operation if the physical plan otherwise isn't calling for it. Overall, the choice of making changes to these two heuristics isn't very principled and is driven by a single query from one of our largest customers which happened to hit the hot spot scenario as described above. In their case, they have append-like workload that is constantly updating a single range. Eventually that range is split automatically, but both new ranges stay on the same node. The latest data is accessed far more frequently than any other data in the table, yet according to the KV heuristics the ranges aren't being reallocated because the scans hitting the hot ranges aren't exceeding the threshold. What isn't accounted for is the fact that other parts of the flow are far more compute-intensive, so this change attempts to alleviate such a hot node scenario. Release note (sql change): Some queries with lookup joins and/or top K sorts are now more likely to be executed in "local" manner with `distsql=auto` session variable.

yuzefovich

I chatted with @rytaft, and given that both Radu and Tommy are onboard with this change, we'll merge to it to master and will backport to 21.1 behind a cluster setting.

TFTRs!

bors r+

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @cucaroach and @RaduBerinde)

craig · 2021-08-09T22:19:58Z

Build succeeded:

GitHub CI (Cockroach)

yuzefovich added the do-not-merge bors won't merge a PR with this label. label Aug 6, 2021

yuzefovich force-pushed the lookup-join-dist branch from 14eda29 to f016654 Compare August 6, 2021 03:06

yuzefovich force-pushed the lookup-join-dist branch 2 times, most recently from 204236f to 41b1d5f Compare August 6, 2021 05:00

yuzefovich marked this pull request as ready for review August 6, 2021 05:01

yuzefovich requested review from RaduBerinde and a team August 6, 2021 05:01

cucaroach approved these changes Aug 6, 2021

View reviewed changes

cucaroach reviewed Aug 6, 2021

View reviewed changes

yuzefovich commented Aug 6, 2021

View reviewed changes

yuzefovich force-pushed the lookup-join-dist branch from 41b1d5f to 2960378 Compare August 9, 2021 18:56

yuzefovich removed the do-not-merge bors won't merge a PR with this label. label Aug 9, 2021

yuzefovich force-pushed the lookup-join-dist branch from 2960378 to 8d5751d Compare August 9, 2021 19:16

yuzefovich commented Aug 9, 2021

View reviewed changes

yuzefovich mentioned this pull request Aug 9, 2021

release-21.1: sql: change physical planning heuristics a bit to prefer local execution #68613

Merged

craig bot merged commit 8de063f into cockroachdb:master Aug 9, 2021

yuzefovich deleted the lookup-join-dist branch August 9, 2021 22:38

jseldess mentioned this pull request Sep 8, 2021

sql: change physical planning heuristics a bit to prefer local execution cockroachdb/docs#11414

Open

yuzefovich mentioned this pull request Sep 24, 2021

sql: move a single remote flow to the gateway in some cases #70648

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: change physical planning heuristics a bit to prefer local execution #68524

sql: change physical planning heuristics a bit to prefer local execution #68524

yuzefovich commented Aug 6, 2021 •

edited

Loading

cockroach-teamcity commented Aug 6, 2021

RaduBerinde commented Aug 6, 2021

yuzefovich commented Aug 6, 2021

cucaroach left a comment

cucaroach left a comment

yuzefovich left a comment

yuzefovich left a comment

craig bot commented Aug 9, 2021

sql: change physical planning heuristics a bit to prefer local execution #68524

sql: change physical planning heuristics a bit to prefer local execution #68524

Conversation

yuzefovich commented Aug 6, 2021 • edited Loading

cockroach-teamcity commented Aug 6, 2021

RaduBerinde commented Aug 6, 2021

yuzefovich commented Aug 6, 2021

cucaroach left a comment

Choose a reason for hiding this comment

cucaroach left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

craig bot commented Aug 9, 2021

yuzefovich commented Aug 6, 2021 •

edited

Loading