Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: support locality optimized search for scans with more than 1 row #69395

Merged
merged 2 commits into from
Aug 28, 2021

Conversation

rytaft
Copy link
Collaborator

@rytaft rytaft commented Aug 26, 2021

This commit updates the logic for planning locality optimized search to
allow the optimization the be planned if there are fewer than 100,000
keys selected.

The optimization is not yet supported for scans with a hard limit.

Informs #64862

Release justification: Low risk, high benefit change to existing functionality.

Release note (performance improvement): locality optimized search
is now supported for scans that are guaranteed to return 100000 keys
or less. This optimization allows the execution engine to avoid
visiting remote regions if all requested keys are found in the local
region, thus reducing the latency of the query.

@rytaft rytaft requested review from yuzefovich, RaduBerinde and a team August 26, 2021 02:34
@rytaft rytaft requested a review from a team as a code owner August 26, 2021 02:34
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@cucaroach cucaroach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there a 10k limit? Is it an oom avoidance thing?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @yuzefovich)

@rytaft
Copy link
Collaborator Author

rytaft commented Aug 26, 2021

It's just the kv batch size, added a comment.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Reviewed 4 of 5 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @rytaft)


-- commits, line 5 at r2:
super nit: s/are fewer/are no more/.


pkg/sql/opt_exec_factory.go, line 561 at r2 (raw file):

	left, right exec.Node, reqOrdering exec.OutputOrdering, hardLimit uint64,
) (exec.Node, error) {
	if hardLimit > 1 {

Should we assert that hardLimit <= 10000?


pkg/sql/opt/xform/scan_funcs.go, line 155 at r2 (raw file):

	// TODO(rytaft): Revisit this when we have a more accurate cost model for data
	// distribution.
	const localityOptScanMaxRows = 10000

I think this number has actually been increased to 100k recently by Andrei, so I wonder whether we should just use row.productionKVBatchSize directly here (possibly moving into some common base package)?

@rytaft rytaft requested a review from a team August 27, 2021 12:33
Copy link
Collaborator Author

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @yuzefovich)


-- commits, line 5 at r2:

Previously, yuzefovich (Yahor Yuzefovich) wrote…

super nit: s/are fewer/are no more/.

Done.


pkg/sql/opt_exec_factory.go, line 561 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Should we assert that hardLimit <= 10000?

I'm inclined not to, since 10000 was just an arbitrary number. We had this assertion for 1 row before since we thought the optimization wouldn't work with more than row, but as I've discovered, it works just fine. I'd rather not have to keep this assertion in sync with whatever limit we place in the optimizer.


pkg/sql/opt/xform/scan_funcs.go, line 155 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I think this number has actually been increased to 100k recently by Andrei, so I wonder whether we should just use row.productionKVBatchSize directly here (possibly moving into some common base package)?

Done. I made a new package since none of the existing packages seemed appropriate (added in a separate commit). Let me know if you have an idea of a better place for it.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 26 of 26 files at r3, 5 of 5 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde and @rytaft)


-- commits, line 24 at r4:
nit: s/10000/100000/.


-- commits, line 34 at r4:
ditto


pkg/sql/opt_exec_factory.go, line 561 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I'm inclined not to, since 10000 was just an arbitrary number. We had this assertion for 1 row before since we thought the optimization wouldn't work with more than row, but as I've discovered, it works just fine. I'd rather not have to keep this assertion in sync with whatever limit we place in the optimizer.

Makes sense.


pkg/sql/opt/xform/scan_funcs.go, line 155 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Done. I made a new package since none of the existing packages seemed appropriate (added in a separate commit). Let me know if you have an idea of a better place for it.

Looks good to me.

Copy link
Collaborator Author

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde, @rytaft, and @yuzefovich)


-- commits, line 24 at r4:

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: s/10000/100000/.

Done.


-- commits, line 34 at r4:

Previously, yuzefovich (Yahor Yuzefovich) wrote…

ditto

Done.

@craig
Copy link
Contributor

craig bot commented Aug 27, 2021

Build failed (retrying...):

This commit creates a new package called rowinfra and moves some
constants and type definitions from the row package to the new
package so that they can be used externally without adding a
dependency on the row package.

Note: I don't really like the name rowinfra, but I think it works
given the current name of the row package. I think it would be better
to rename row to something that hints at it's actual purpose, which
seems to be as the interface between the kv and sql layers. If we
rename row, then we can rename rowinfra to match.

Release note: None

Release justification: Low risk, high benefit change to existing
functionality that is needed for a follow-on commit.
@rytaft
Copy link
Collaborator Author

rytaft commented Aug 27, 2021

bors r-

I needed to rebase.

@craig
Copy link
Contributor

craig bot commented Aug 27, 2021

Canceled.

@rytaft
Copy link
Collaborator Author

rytaft commented Aug 27, 2021

bors r+

@craig
Copy link
Contributor

craig bot commented Aug 28, 2021

Build failed (retrying...):

@yuzefovich
Copy link
Member

yuzefovich commented Aug 28, 2021

bors r-

This needs a rewrite of regional_by_row logic test (which I'll do in a minute).

@craig
Copy link
Contributor

craig bot commented Aug 28, 2021

Canceled.

This commit updates the logic for planning locality optimized search to
allow the optimization the be planned if there are no more than 100,000
keys selected.

The optimization is not yet supported for scans with a hard limit.

Informs cockroachdb#64862

Release justification: Low risk, high benefit change to existing functionality.

Release note (performance improvement): locality optimized search
is now supported for scans that are guaranteed to return 100,000 keys
or less. This optimization allows the execution engine to avoid
visiting remote regions if all requested keys are found in the local
region, thus reducing the latency of the query.
@yuzefovich
Copy link
Member

I removed the DISTSQL option from the logic test (because the physical plan differs between vectorize=on and vectorize=off configs).

bors r+

@craig
Copy link
Contributor

craig bot commented Aug 28, 2021

Build succeeded:

@rytaft
Copy link
Collaborator Author

rytaft commented Aug 28, 2021

I removed the DISTSQL option from the logic test (because the physical plan differs between vectorize=on and vectorize=off configs).

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants