Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

constraint: perform cancel checking when combining constraints #111979

Merged

Conversation

msirek
Copy link
Contributor

@msirek msirek commented Oct 7, 2023

Previously, the combining of constraint spans when one constraint is a
suffix of the other could take a long time and cause the
statement_timeout session setting to not be honored when each
constraint has hundreds or thousands of spans.

The issue is that constraint.Combine has double nested loops to
consider every combination of one span from one constraint with one span
of the other constraint. The building of possibly millions of spans
may take excessive CPU time and allocate excessive amounts of memory.

The fix is to maintain a counter in constraint.Combine and call the
query cancel check function every 16 iterations. The cancel check
function itself will check for query timeout every 1024 iterations, so
effectively every 16K iterations constraint.Combine will perform
cancel checking and abort the query if the timeout has been reached.

Epic: none
Fixes: #111862

Release note (bug fix): This patch fixes an issue where the optimizer
fails to honor the statement_timeout session setting when generating
constrained index scans for queries with large IN lists or = ANY
predicates on multiple index key columns, which may lead to an out
of memory condition on the node.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@msirek msirek force-pushed the constraint-combination-cancel-checking branch 4 times, most recently from 010e244 to 56a18e0 Compare October 8, 2023 02:48
@msirek msirek requested review from cucaroach and yuzefovich October 8, 2023 04:31
@msirek msirek marked this pull request as ready for review October 8, 2023 04:33
@msirek msirek requested a review from a team as a code owner October 8, 2023 04:33
Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: I only have a few nits. Perhaps @DrewKimball should take a quick look too.

Reviewed 4 of 4 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @cucaroach and @msirek)


pkg/sql/opt/constraint/constraint_test.go line 501 at r1 (raw file):

		}
	}
	for i, tc := range testData {

nit: isn't there only a single test case? Do you envision that we'll add more test cases in the future, and that's why we have this for loop?


pkg/sql/opt/constraint/constraint_test.go line 531 at r1 (raw file):

			j.Combine(&evalCtx, &l, cancelFunc)
			n.Combine(&evalCtx, &p, cancelFunc)
			// The next Combine call will panic within a second or two.

nit: should we actually time how long the cancellation takes and assert that it's lower than some pre-defined constant?


pkg/sql/opt/constraint/constraint_test.go line 535 at r1 (raw file):

			// This Combine call panics right away since the iterations target has
			// already been reached.
			require.Panics(t, func() { j.Combine(&evalCtx, &n, cancelFunc) })

nit: perhaps store iterations before calling Combine and then assert that it only increases by 1?

@mgartner mgartner requested a review from DrewKimball October 9, 2023 16:57
Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Nice work!

Reviewed 4 of 4 files at r1, all commit messages.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @cucaroach and @msirek)


pkg/sql/opt/constraint/constraint_test.go line 495 at r1 (raw file):

	iterations := 0
	cancelFunc := func() {

[nit] We should move these into the loop (if we keep it around) in case more test cases are added.

@msirek msirek force-pushed the constraint-combination-cancel-checking branch from 56a18e0 to 0ebba11 Compare October 9, 2023 18:25
Copy link
Contributor Author

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @cucaroach, @DrewKimball, and @yuzefovich)


pkg/sql/opt/constraint/constraint_test.go line 495 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] We should move these into the loop (if we keep it around) in case more test cases are added.

Done


pkg/sql/opt/constraint/constraint_test.go line 501 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: isn't there only a single test case? Do you envision that we'll add more test cases in the future, and that's why we have this for loop?

I'm just following the pattern of other tests here, so it's easy to add more tests if we want to.


pkg/sql/opt/constraint/constraint_test.go line 531 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: should we actually time how long the cancellation takes and assert that it's lower than some pre-defined constant?

OK, added checking that each cancellation takes less than one second.


pkg/sql/opt/constraint/constraint_test.go line 535 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: perhaps store iterations before calling Combine and then assert that it only increases by 1?

It's not going to increase just by 1. It will increase by some factor of the number of spans in the 2 constraints plus maybe extra iterations on the main outer loop. If for some reason this count starts going up due to a future software change, then earlier calls to Combine will panic before the one where we expect it, and the test will fail (currently if I change the if iterations > 2000 check to if iterations > 2000 the test would fail because we reach more than 1000 iteration because combining a with e).

@msirek msirek added backport-22.2.x backport-23.1.x Flags PRs that need to be backported to 23.1 labels Oct 9, 2023
Previously, the combining of constraint spans when one constraint is a
suffix of the other could take a long time and cause the
`statement_timeout` session setting to not be honored when each
constraint has hundreds or thousands of spans.

The issue is that `constraint.Combine` has double nested loops to
consider every combination of one span from one constraint with one span
of the other constraint. The building of possibly millions of spans
may take excessive CPU time and allocate excessive amounts of memory.

The fix is to maintain a counter in `constraint.Combine` and call the
query cancel check function every 16 iterations. The cancel check
function itself will check for query timeout every 1024 iterations, so
effectively every 16K iterations `constraint.Combine` will perform
cancel checking and abort the query if the timeout has been reached.

Epic: none
Fixes: cockroachdb#111862

Release note (bug fix): This patch fixes an issue where the optimizer
fails to honor the `statement_timeout` session setting when generating
constrained index scans for queries with large IN lists or `= ANY`
predicates on multiple index key columns, which may lead to an out
of memory condition on the node.
@msirek msirek force-pushed the constraint-combination-cancel-checking branch from 0ebba11 to 313c4d9 Compare October 9, 2023 19:33
Copy link
Contributor Author

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!
bors r=yuzefovich,DrewKimball

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @cucaroach, @DrewKimball, and @yuzefovich)

@msirek
Copy link
Contributor Author

msirek commented Oct 9, 2023

currently if I change the if iterations > 2000 check to if iterations > 2000...

I meant to say:
currently if I change the if iterations > 2000 check to if iterations > 1000...

@craig
Copy link
Contributor

craig bot commented Oct 9, 2023

Build succeeded:

@craig craig bot merged commit cd1adae into cockroachdb:master Oct 9, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

opt: investigate slow cancellation
4 participants