expression: Short cut expr vec bug #19775

fzhedu · 2020-09-03T14:09:34Z

What problem does this PR solve?

Issue Number: close #17725

Problem Summary: fallback vectorization to scalar execution when error/warnings happen.

What is changed and how it works?

Proposal: xxx

What's Changed:

How it Works:

Related changes

Need to cherry-pick to the release branch

Check List

Tests

Integration test

Release note

solve vectorization bug from and/or/COALESCE due to short cut

zz-jason

I think we can modify the vectorized function to:

x OR y: if x[i] = 0, evaluate y[i], else skip y[i]
x AND y: if x[i] = 1, evaluate y[i], else skip y[i]

fzhedu · 2020-09-04T02:30:23Z

I think we can modify the vectorized function to:

x OR y: if x[i] = 0, evaluate y[i], else skip y[i]

x AND y: if x[i] = 1, evaluate y[i], else skip y[i]

This would be more complex.

this way does not change the fact that vectorization should evaluate every arg.
2.it indeed adds an additional loop to check x[i] == 0/1 ? before y[i], but not shortcut the execution like in the scalar execution.

zz-jason · 2020-09-04T02:35:47Z

I think we can modify the vectorized function to:

x OR y: if x[i] = 0, evaluate y[i], else skip y[i]

x AND y: if x[i] = 1, evaluate y[i], else skip y[i]

This would be more complex.

this way does not change the fact that vectorization should evaluate every arg.
2.it indeed adds an additional loop to check x[i] == 0/1 ? before y[i], but not shortcut the execution like in the scalar execution.

We can firstly evaluate the x vector, then evaluate each scalar value on y[i] based on the result of x[i].

BTW, how other vectorized databases handles these kinds of expressions?

fzhedu · 2020-09-04T04:23:40Z

I think we can modify the vectorized function to:

x OR y: if x[i] = 0, evaluate y[i], else skip y[i]

x AND y: if x[i] = 1, evaluate y[i], else skip y[i]

This would be more complex.

this way does not change the fact that vectorization should evaluate every arg.
2.it indeed adds an additional loop to check x[i] == 0/1 ? before y[i], but not shortcut the execution like in the scalar execution.

We can firstly evaluate the x vector, then evaluate each scalar value on y[i] based on the result of x[i].

BTW, how other vectorized databases handles these kinds of expressions?

In formal vectiorzation, each arg should be executed in vectorization.
y maybe other complex functions, not just a constant or a column, it is better to vectorize each arg. Once erros/warnings happen (rarely happen), falls back to the scalar execution.

zz-jason · 2020-09-04T04:38:42Z

to summarize, there are two methods in the above to handle short circuit for x AND y:

evaluate all the arguments to get vector x and y, if not all the values in x are 0, fallback to the row-based execution method.
evaluate the vector x firstly, conditionally evaluate y[i] based on the value of x[i].

IMO, the performance of the second method is better than the first method in most cases, where not all the values in the vector x are 0.

So I prefer the second method. It achieves the short circuit and the performance is not sacrificed much compared with the old vectorized method.

fzhedu · 2020-09-04T04:47:02Z

to summarize, there are two methods in the above to handle short circuit for x AND y:

evaluate all the arguments to get vector x and y, if not all the values in x are 0, fallback to the row-based execution method.

evaluate the vector x firstly, conditionally evaluate y[i] based on the value of x[i].

IMO, the performance of the second method is better than the first method in most cases, where not all the values in the vector x are 0.

So I prefer the second method. It achieves the short circuit and the performance is not sacrificed much compared with the old vectorized method.

It depends.
x and y if y = (a*b>C) such complex sub expressions, and x has certain 0s, taking the vectorization would better.

zz-jason · 2020-09-04T05:03:42Z

Would you like to construct some expression cases and do the performance benchmarks on both methods? It's hard to decide to use which one without any benchmark result.

SunRunAway · 2020-09-04T05:28:53Z

Would you like to construct some expression cases and do the performance benchmarks on both methods? It's hard to decide to use which one without any benchmark result.

Hi, @zz-jason
I suggest dividing this issue into 2 steps.

First, use method 1 to fix this bug as a workaround in this PR.
Then make a further plan to reconstruct the vectorized expression framework by method 2 in the future for performance improvement. cc @qw4990

zz-jason · 2020-09-04T05:32:12Z

I think method 1 and 2 are both workarounds, the difference is which workaround can have better performance. It's not hard to implement method 2 and do some benchmarks IMO.

SunRunAway · 2020-09-04T05:49:05Z

Method 2 needs to involve the selection vector into all the signatures of the vectorized expression framework. It may take plenty of time. It's better to create another issue and have a good design which may be a performance challenge program.
In the meantime, the existing bug should not be left in the release version so I think the workaround of this PR should be merged ASAP.

zz-jason · 2020-09-04T05:59:06Z

Method 2 needs to involve the selection vector into all the signatures of the vectorized expression framework.

No need to. We can combine the usage of evalXXX and VecEvalXXX in this workaround.

I think the workaround of this PR should be merged ASAP.

It's not that urgent to fix this issue since it only affects the warning messages.

SunRunAway · 2020-09-04T06:11:35Z

Method 2 needs to involve the selection vector into all the signatures of the vectorized expression framework.

No need to. We can combine the usage of evalXXX and VecEvalXXX in this workaround.

That's an even more complex workaround. And I think this could be a new performance issue since this improvement exiting before whenever this bug happens. Even more, I think it's still not worth involving another complex workaround since we have a better idea which uses the selection vector to do short circuit.

It's not that urgent to fix this issue since it only affects the warning messages.

DML queries treat it as an error.

mysql> update t set a=1 and 1/0;
ERROR 1365 (22012): Division by 0

zz-jason · 2020-09-04T06:30:24Z

If there is an error in the DML statements, the DML queries won't execute successfully, thus the data correctness and consistency can be guaranteed. So I think it's not that urgent.

Though both methods are workarounds, I think it's important to reduce the performance regression. I'm OK with method 1 if the benchmark result shows that method 1 brings less performance regression than method 2.

expression/generator/compare_vec.go

SunRunAway

LGTM

zz-jason

LGTM

zz-jason · 2020-09-09T12:02:11Z

/merge

fzhedu · 2020-09-16T03:30:07Z

/merge

ti-srebot · 2020-09-16T03:32:38Z

/run-all-tests

ti-srebot · 2020-09-16T03:49:06Z

@fzhedu merge failed.

fzhedu · 2020-09-16T06:36:29Z

/run-check_dev_2

fzhedu · 2020-09-18T02:04:22Z

/run-check_dev_2

fzhedu · 2020-09-18T02:35:41Z

/merge

ti-srebot · 2020-09-18T02:35:43Z

Your auto merge job has been accepted, waiting for:

19998

ti-srebot · 2020-09-18T02:41:39Z

/run-all-tests

ti-srebot · 2020-09-18T02:49:41Z

/run-all-tests

ti-srebot · 2020-09-18T03:00:07Z

@fzhedu merge failed.

SunRunAway · 2020-09-18T04:09:45Z

/merge

ti-srebot · 2020-09-18T04:10:06Z

/run-all-tests

Signed-off-by: ti-srebot <[email protected]>

ti-srebot · 2020-09-18T04:21:00Z

cherry pick to release-4.0 in PR #20092

Signed-off-by: ti-srebot <[email protected]>

fzhedu added component/expression needs-cherry-pick-4.0 labels Sep 3, 2020

fzhedu requested review from SunRunAway, qw4990 and lzmhhh123 September 3, 2020 14:09

fzhedu requested a review from a team as a code owner September 3, 2020 14:09

zz-jason reviewed Sep 3, 2020

View reviewed changes

fzhedu mentioned this pull request Sep 4, 2020

expression: avoid unnecessary warnings/errors when folding constants in shortcut-able expressions #19797

Merged

SunRunAway reviewed Sep 8, 2020

View reviewed changes

expression/generator/compare_vec.go Show resolved Hide resolved

SunRunAway reviewed Sep 9, 2020

View reviewed changes

ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label Sep 9, 2020

SunRunAway requested a review from zz-jason September 9, 2020 08:23

SunRunAway added the type/bugfix This PR fixes a bug. label Sep 9, 2020

zz-jason previously approved these changes Sep 9, 2020

View reviewed changes

ti-srebot removed the status/LGT1 Indicates that a PR has LGTM 1. label Sep 9, 2020

ti-srebot previously approved these changes Sep 9, 2020

View reviewed changes

ti-srebot added the status/LGT2 Indicates that a PR has LGTM 2. label Sep 9, 2020

fzhedu dismissed zz-jason’s stale review via f15c7c0 September 14, 2020 06:40

fzhedu force-pushed the ShortCutExprVecBug branch from fbe962a to f15c7c0 Compare September 14, 2020 06:40

fzhedu requested review from SunRunAway, zz-jason, qw4990 and ti-srebot September 16, 2020 03:30

Merge branch 'master' into ShortCutExprVecBug

c9254f6

Merge branch 'master' into ShortCutExprVecBug

0fd0fb9

Merge branch 'master' into ShortCutExprVecBug

5cf8c35

SunRunAway approved these changes Sep 18, 2020

View reviewed changes

SunRunAway merged commit 7948c12 into pingcap:master Sep 18, 2020

ti-srebot pushed a commit to ti-srebot/tidb that referenced this pull request Sep 18, 2020

cherry pick pingcap#19775 to release-4.0

7e799fd

Signed-off-by: ti-srebot <[email protected]>

ti-srebot mentioned this pull request Sep 18, 2020

expression: Short cut expr vec bug (#19775) #20092

Merged

ti-srebot added a commit that referenced this pull request Sep 21, 2020

expression: Short cut expr vec bug (#19775) (#20092)

1650b5b

Signed-off-by: ti-srebot <[email protected]>

ti-srebot mentioned this pull request Nov 13, 2020

expression: avoid unnecessary warnings/errors when folding constants in shortcut-able expressions (#19797) #21040

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

expression: Short cut expr vec bug #19775

expression: Short cut expr vec bug #19775

fzhedu commented Sep 3, 2020

zz-jason left a comment

fzhedu commented Sep 4, 2020

zz-jason commented Sep 4, 2020

fzhedu commented Sep 4, 2020

zz-jason commented Sep 4, 2020 •

edited

Loading

fzhedu commented Sep 4, 2020

zz-jason commented Sep 4, 2020

SunRunAway commented Sep 4, 2020 •

edited

Loading

zz-jason commented Sep 4, 2020

SunRunAway commented Sep 4, 2020 •

edited

Loading

zz-jason commented Sep 4, 2020 •

edited

Loading

SunRunAway commented Sep 4, 2020

zz-jason commented Sep 4, 2020 •

edited

Loading

SunRunAway left a comment

zz-jason left a comment

zz-jason commented Sep 9, 2020

fzhedu commented Sep 16, 2020

ti-srebot commented Sep 16, 2020

ti-srebot commented Sep 16, 2020

fzhedu commented Sep 16, 2020

fzhedu commented Sep 18, 2020

fzhedu commented Sep 18, 2020

ti-srebot commented Sep 18, 2020

ti-srebot commented Sep 18, 2020

ti-srebot commented Sep 18, 2020

ti-srebot commented Sep 18, 2020

SunRunAway commented Sep 18, 2020

ti-srebot commented Sep 18, 2020

ti-srebot commented Sep 18, 2020

expression: Short cut expr vec bug #19775

expression: Short cut expr vec bug #19775

Conversation

fzhedu commented Sep 3, 2020

What problem does this PR solve?

What is changed and how it works?

Related changes

Check List

Release note

zz-jason left a comment

Choose a reason for hiding this comment

fzhedu commented Sep 4, 2020

zz-jason commented Sep 4, 2020

fzhedu commented Sep 4, 2020

zz-jason commented Sep 4, 2020 • edited Loading

fzhedu commented Sep 4, 2020

zz-jason commented Sep 4, 2020

SunRunAway commented Sep 4, 2020 • edited Loading

zz-jason commented Sep 4, 2020

SunRunAway commented Sep 4, 2020 • edited Loading

zz-jason commented Sep 4, 2020 • edited Loading

SunRunAway commented Sep 4, 2020

zz-jason commented Sep 4, 2020 • edited Loading

SunRunAway left a comment

Choose a reason for hiding this comment

zz-jason left a comment

Choose a reason for hiding this comment

zz-jason commented Sep 9, 2020

fzhedu commented Sep 16, 2020

ti-srebot commented Sep 16, 2020

ti-srebot commented Sep 16, 2020

fzhedu commented Sep 16, 2020

fzhedu commented Sep 18, 2020

fzhedu commented Sep 18, 2020

ti-srebot commented Sep 18, 2020

ti-srebot commented Sep 18, 2020

ti-srebot commented Sep 18, 2020

ti-srebot commented Sep 18, 2020

SunRunAway commented Sep 18, 2020

ti-srebot commented Sep 18, 2020

ti-srebot commented Sep 18, 2020

zz-jason commented Sep 4, 2020 •

edited

Loading

SunRunAway commented Sep 4, 2020 •

edited

Loading

SunRunAway commented Sep 4, 2020 •

edited

Loading

zz-jason commented Sep 4, 2020 •

edited

Loading

zz-jason commented Sep 4, 2020 •

edited

Loading