-
Notifications
You must be signed in to change notification settings - Fork 435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-4652][VL] Fix min_by/max_by result mismatch #5544
Conversation
6065078
to
789591b
Compare
@rui-mo Can you help review this PR? Thanks. |
case _: Average | _: Sum if aggFunc.dataType.isInstanceOf[DecimalType] => | ||
"row_constructor" | ||
case _: MaxMinBy => | ||
"row_constructor_with_all_null" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yma11 Is this change due to the semantics difference between Velox and Spark? I wonder if it is possible to adjust the semantics of the implementation in Velox and avoid introducing more hack in Gluten.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think by default, the upstream row_constructor
doesn't set null value of the struct, it just take the input children and wrap them as a row. These issues are mainly caused by the additional projects we added, so it's reasonable to handle it at our side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point. Could we add a comment to explain why row_constructor_with_all_null
is needed by MaxMinBy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
case _: Average | _: Sum if aggFunc.dataType.isInstanceOf[DecimalType] => | ||
"row_constructor" | ||
case _: MaxMinBy => | ||
"row_constructor_with_all_null" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point. Could we add a comment to explain why row_constructor_with_all_null
is needed by MaxMinBy?
#include "velox/expression/SpecialForm.h" | ||
|
||
namespace gluten { | ||
class RowConstructorWithAllNullCallToSpecialForm : public facebook::velox::exec::FunctionCallToSpecialForm { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to extend RowConstructorWithNullCallToSpecialForm
and reuse most of the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
d522a35
to
bd5ebc7
Compare
@rui-mo your comments are addressed. Please help review again. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
I found the result still not right after applying this patch.
Expected result is
Actual result is
|
Also cc @ulysses-you |
What changes were proposed in this pull request?
Fix
min_by
/max_by
result mismatch. Takemax_by
for example, we need to keep intermediate result row like<null, 11>
which will be compared with another result like<5, 8>
and assure final result is<null, 11>
.How was this patch tested?
New UT added