-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
expression: Short cut expr vec bug #19775
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can modify the vectorized function to:
- x OR y: if x[i] = 0, evaluate y[i], else skip y[i]
- x AND y: if x[i] = 1, evaluate y[i], else skip y[i]
This would be more complex.
|
We can firstly evaluate the x vector, then evaluate each scalar value on y[i] based on the result of x[i]. BTW, how other vectorized databases handles these kinds of expressions? |
In formal vectiorzation, each arg should be executed in vectorization. |
to summarize, there are two methods in the above to handle short circuit for
IMO, the performance of the second method is better than the first method in most cases, where not all the values in the vector x are 0. So I prefer the second method. It achieves the short circuit and the performance is not sacrificed much compared with the old vectorized method. |
It depends. |
Would you like to construct some expression cases and do the performance benchmarks on both methods? It's hard to decide to use which one without any benchmark result. |
Hi, @zz-jason
|
I think method 1 and 2 are both workarounds, the difference is which workaround can have better performance. It's not hard to implement method 2 and do some benchmarks IMO. |
Method 2 needs to involve the selection vector into all the signatures of the vectorized expression framework. It may take plenty of time. It's better to create another issue and have a good design which may be a performance challenge program. |
No need to. We can combine the usage of
It's not that urgent to fix this issue since it only affects the warning messages. |
That's an even more complex workaround. And I think this could be a new performance issue since this improvement exiting before whenever this bug happens. Even more, I think it's still not worth involving another complex workaround since we have a better idea which uses the selection vector to do short circuit.
DML queries treat it as an error. mysql> update t set a=1 and 1/0;
ERROR 1365 (22012): Division by 0 |
If there is an error in the DML statements, the DML queries won't execute successfully, thus the data correctness and consistency can be guaranteed. So I think it's not that urgent. Though both methods are workarounds, I think it's important to reduce the performance regression. I'm OK with method 1 if the benchmark result shows that method 1 brings less performance regression than method 2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/merge |
fbe962a
to
f15c7c0
Compare
/merge |
/run-all-tests |
@fzhedu merge failed. |
/run-check_dev_2 |
1 similar comment
/run-check_dev_2 |
/merge |
Your auto merge job has been accepted, waiting for:
|
/run-all-tests |
/run-all-tests |
@fzhedu merge failed. |
/merge |
/run-all-tests |
Signed-off-by: ti-srebot <[email protected]>
cherry pick to release-4.0 in PR #20092 |
Signed-off-by: ti-srebot <[email protected]>
What problem does this PR solve?
Issue Number: close #17725
Problem Summary: fallback vectorization to scalar execution when error/warnings happen.
What is changed and how it works?
Proposal: xxx
What's Changed:
How it Works:
Related changes
Check List
Tests
Release note