-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix](hist) Fix unstable result of aggregrate function hist #38608
Conversation
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
1 similar comment
clang-tidy review says "All clean, LGTM! 👍" |
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
1 similar comment
clang-tidy review says "All clean, LGTM! 👍" |
TPC-H: Total hot run time: 42245 ms
|
TPC-DS: Total hot run time: 169461 ms
|
ClickBench: Total hot run time: 29.8 s
|
and this change need consider upgrading? |
This pr only has impact on all-null block(in other cases, add method of hist data will be called), and in this situation, the original result could be incorrect, so we only need to avoid crash when we do upgrading. We have two situations here:
In first case, new be will serialize block to a ColumnString, and its max_input_block is -1, old be will do deserialize and merge in its aggregate sink and do get result in its aggregate source. On old be, its merge method actually can not handle the negative max_input_block correctly, and if it happens to be the last block to be merged, we will have trouble ... So it seems we do have upgrading problem in this situation. In the second case, old be will generate a serialized String with max_input_block is 128(default value on old BE), merge method on new be will merge this data, so the result will still be incorrect but it seems no other impact like be crash in this situation. |
run buildall |
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
TPC-H: Total hot run time: 41742 ms
|
TPC-DS: Total hot run time: 170140 ms
|
ClickBench: Total hot run time: 29.72 s
|
run external |
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…8608) * Target Fix unstable result of hist function when involving null value. * Reproduce test result of `regression-test/suites/query_p0/sql_functions/aggregate_functions/test_aggregate_all_functions2.groovy` is unstable, sql `SELECT histogram(k7, 5) FROM baseall` will sometimes acts like the second argument is not passed in. * Root reason We have short-circuit in AggregateFunctionNullVariadicInline, when this row is NULL, the value will not be added by the nested function. Implementation of histogram relies on its add method to get its seconds argument, when we have an all null value block, histogram will not get its seconds arg even if sql is like `select(k7, 5)`, so a max_bucket_num with default value 128 is serialized. When we do merging, and happens to deserialize the above block at last, the max_bucket_num in merge stage will be assigned to 128, and this leads to the wrong result. * Fix by Init value of max_bucket_num is assigned to 0, when we do merging, we will discard this aggregated data if its max_bucket_num is 0.
…8608) * Target Fix unstable result of hist function when involving null value. * Reproduce test result of `regression-test/suites/query_p0/sql_functions/aggregate_functions/test_aggregate_all_functions2.groovy` is unstable, sql `SELECT histogram(k7, 5) FROM baseall` will sometimes acts like the second argument is not passed in. * Root reason We have short-circuit in AggregateFunctionNullVariadicInline, when this row is NULL, the value will not be added by the nested function. Implementation of histogram relies on its add method to get its seconds argument, when we have an all null value block, histogram will not get its seconds arg even if sql is like `select(k7, 5)`, so a max_bucket_num with default value 128 is serialized. When we do merging, and happens to deserialize the above block at last, the max_bucket_num in merge stage will be assigned to 128, and this leads to the wrong result. * Fix by Init value of max_bucket_num is assigned to 0, when we do merging, we will discard this aggregated data if its max_bucket_num is 0.
* Target Fix unstable result of hist function when involving null value. * Reproduce test result of `regression-test/suites/query_p0/sql_functions/aggregate_functions/test_aggregate_all_functions2.groovy` is unstable, sql `SELECT histogram(k7, 5) FROM baseall` will sometimes acts like the second argument is not passed in. * Root reason We have short-circuit in AggregateFunctionNullVariadicInline, when this row is NULL, the value will not be added by the nested function. Implementation of histogram relies on its add method to get its seconds argument, when we have an all null value block, histogram will not get its seconds arg even if sql is like `select(k7, 5)`, so a max_bucket_num with default value 128 is serialized. When we do merging, and happens to deserialize the above block at last, the max_bucket_num in merge stage will be assigned to 128, and this leads to the wrong result. * Fix by Init value of max_bucket_num is assigned to 0, when we do merging, we will discard this aggregated data if its max_bucket_num is 0.
Fix unstable result of hist function when involving null value.
test result of
regression-test/suites/query_p0/sql_functions/aggregate_functions/test_aggregate_all_functions2.groovy
is unstable, sqlSELECT histogram(k7, 5) FROM baseall
will sometimes acts like the second argument is not passed in.We have short-circuit in AggregateFunctionNullVariadicInline, when this row is NULL, the value will not be added by the nested function. Implementation of histogram relies on its add method to get its seconds argument, when we have an all null value block, histogram will not get its seconds arg even if sql is like
select(k7, 5)
, so a max_bucket_num with default value 128 is serialized. When we do merging, and happens to deserialize the above block at last, the max_bucket_num in merge stage will be assigned to 128, and this leads to the wrong result.Init value of max_bucket_num is assigned to 0, when we do merging, we will discard this aggregated data if its max_bucket_num is 0.