Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] row_constructor in gluten will make top-level struct to null if its child fields have null value #1303

Closed
liujiayi771 opened this issue Apr 6, 2023 · 9 comments
Labels
question Further information is requested

Comments

@liujiayi771
Copy link
Contributor

liujiayi771 commented Apr 6, 2023

Describe the bug
I have a RowVector, its top-level struct value is non-null, but some fields in child vectors are null. However, if the row data is evaluated in row_construct function in gluten, its top-level struct will become null if any of its child fields have null value. This behavior happens in https://github.com/oap-project/gluten/blob/2685c1f95aa4bd19ad37892d6d87940f77c77815/cpp/velox/compute/RowConstructor.h#L43-L47

I can't understand why when the child filed is null, the index corresponding to the top-level row also needs to be set to null?

@rui-mo I see that you participated in the review of #404, do you know the reason here? I need to make changes here, and I have to go to the reason for this code here.

@liujiayi771 liujiayi771 added the bug Something isn't working label Apr 6, 2023
@rui-mo
Copy link
Contributor

rui-mo commented Apr 7, 2023

@liujiayi771 Actually we do that intentionally. Velox has a implementation for row_constuctor, and we implement another in Gluten to override that one. As you noticed, the main difference is some special logic is added for the null handling. Currently, row_constuctor is only used in Aggregation to combine partial aggregation output columns into struct. For example, the partial outputs of avg is sum and count, and they are combined into ROW(DOUBLE(), BIGINT()) for Velox's final aggregation.

The problem we met was, at AverageAggregate.h#L187, because Velox's row_constuctor did not set null for the struct, decodedPartial_.isNullAt(i) was always false and invalid value was accessed during aggregation. With this overriden function, decodedPartial_.isNullAt(i) can be true when sum or count is null.

@rui-mo rui-mo added question Further information is requested and removed bug Something isn't working labels Apr 7, 2023
@liujiayi771
Copy link
Contributor Author

liujiayi771 commented Apr 7, 2023

I find this behavior but not know the reason, but it seems that based on this behavior, my implementation is fine. When I check whether the sum is null, I do it when the top-level struct is null.

https://github.com/oap-project/velox/blob/c5663fe99c64db7acfa8fbe9ca5d6a238fe0de75/velox/functions/sparksql/aggregates/DecimalSumAggregate.h#L345-L353

@rui-mo
Copy link
Contributor

rui-mo commented Apr 7, 2023

I find this behavior but not know the reason, but it seems that based on this behavior, my implementation is fine. When I check whether the sum is null, I do it when the top-level struct is null.

https://github.com/oap-project/velox/blob/c5663fe99c64db7acfa8fbe9ca5d6a238fe0de75/velox/functions/sparksql/aggregates/DecimalSumAggregate.h#L345-L353

Yes, that's another solution because isEmptyVector and sumVector is the children of the struct, not the struct itself.
We discussed this issue with Velox upstream, they suggest us supporting struct type in Shuffle, then the row_constructor process is not needed. But for now, we override the row_constructor to make least change to Velox code.

@yma11
Copy link
Contributor

yma11 commented Apr 25, 2024

@rui-mo @liujiayi771 Do you guys know whether we still need this customized function? I found an issue in min_by/max_by that this logic doesn't fit.

@yma11
Copy link
Contributor

yma11 commented Apr 25, 2024

@rui-mo @liujiayi771 Do you guys know whether we still need this customized function? I found an issue in min_by/max_by that this logic doesn't fit.

I found there is a case match for row_constructor/row_constructor_with_null for different agg functions, let me try this first.

@rui-mo
Copy link
Contributor

rui-mo commented Apr 26, 2024

@yma11 Could you help check which aggregate functions are still using row_constructor_with_null? If the corresponding logics on Velox side are customized well for Spark, we shall remove this hack.

@yma11
Copy link
Contributor

yma11 commented Apr 28, 2024

Seems we still need the hack because tpch/tpcds failed after removing it. But the existing hack doesn't cover case for min_by/max_by, so I update it in PR 5544.

@rui-mo
Copy link
Contributor

rui-mo commented Apr 29, 2024

Seems we still need the hack because tpch/tpcds failed after removing it.

@yma11 Thanks for checking. Have you noticed which function causes the failure?

@yma11
Copy link
Contributor

yma11 commented Apr 29, 2024

I didn't check the functions because I think the root cause of these issues are caused by the additional projects we added. From the functions perspective, it will do necessary value null checks at addRawInput or addSingleGroupRawInput and only check the row level null in XXIntermediateResults. This is reasonable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants