How to declare the signature of an aggregate function with indirect dependency between intermediate type and return type? #9048
Replies: 6 comments 11 replies
-
Hi @mbasmanova @kagamiori. Do you have any suggestions for adapting to this situation? Is it possible to add support for this in Velox's function signature framework? |
Beta Was this translation helpful? Give feedback.
-
@liujiayi771 It might be possible to extend the framework to support something like this, but before we spend time designing the solution, can you help us understand the use case. What's the logic behind the rules for computing intermediate and final types for sum(decimal)?
|
Beta Was this translation helpful? Give feedback.
-
Hi @liujiayi771, I have a clarification question. The ideal signature you give above says
So if the companion function only knows i_precision == 38, it can infer that a_precision >= 28, but couldn't know the exact value of a_precision and hence the value of r_precision. Do you know whether this is a valid concern or this situation would not happen? |
Beta Was this translation helpful? Give feedback.
-
@kagamiori Thank you for helping to explain this issue. Without changing the existing framework, do you know of any other methods that could work around this issue and support the scenario? If it cannot be resolved for the time being, we plan to first merge a version of Spark decimal average that does not support companion functions into the community, and later on, we'll see if there is a way to support an architecture with companion functions. We would like to first integrate the main logic of the Spark decimal average into the community. |
Beta Was this translation helpful? Give feedback.
-
@mbasmanova @kagamiori @liujiayi771 I am looking at the Spark decimal average signature issue, and these are my observations. The table below lists the intermediate and final values' precision and scale. I find the major problem is that the registration of companion function requires that the result type could be inferred from the intermediate type. While for this case when the intermediate precision is 38, the result precision could be in the range of [32, 38] and cannot be determined.
This requirement comes from the ExtractFunction which resolves the return type from argument types and register a new vector function. velox/velox/exec/AggregateCompanionAdapter.cpp Lines 437 to 443 in d1ac079 Functions are required to provide a way to statically resolve return type from input types but it is not always feasible for an extract companion function. Given that special forms do not have that requirement and they resolve return types dynamically at runtime, I believe it would be more appropriate to register ExtractFunction as a special form and the decimal avg signature issue shall be resolved. This is a prototype #10985 and I will proceed if you think the idea makes sense. I notice @kagamiori is solving this issue by adding function-level states in #8710 but this PR is stale. If you prefer to move forward with that solution, please kindly let me know. Thanks! |
Beta Was this translation helpful? Give feedback.
-
@rui-mo I have a more complete implementation in #9167. But @mbasmanova suggested is that we can do it similar in the SimpleAggregateAdapter so that the UDAF authors doesn't have to keep function-level states. |
Beta Was this translation helpful? Give feedback.
-
The decimal average aggregate function in Spark and Presto differs in many ways. Spark requires a separate implementation of its corresponding average aggregate function in Velox. However, when implementing this aggregate function, I encountered issues with declaring the signature that I couldn't resolve.
The implementation of average in Spark is here. Simply put, when the input type is DECIMAL(p, s), the intermediate data's sum type is DECIMAL(p + 10, s), and the final returned average type is DECIMAL(p + 4, s + 4).
The perfect signature declaration should be like this.
But Velox does not support this type of signature declaration. There are two issues.
Velox does not support having new variables in intermediate types. Declaring
i_precision
will cause an exception to be thrown duringFunctionSignature
validation.The reason for the error here is that
AggregateFunctionSignature
holdsintermediateType_
but does not override its own validation logic, causing an issue with the check for the number of variables.This way of declaring intermediate and return types will cause the companion function to not be registered. Because Velox requires that the variables used in the return type must be a subset of the variables used in the intermediate type, so that the return type can be resolved using the intermediate type. The relevant code logic is here.
The logic for checking if it's resolvable here is simple, it just checks if the variables used in the return type are all included in the variables used in the intermediate type.
But in the above signature, the return type can also be resolved by the intermediate type, although it's an indirect relationship.
The framework currently does not support such indirect dependency relationships.
Beta Was this translation helpful? Give feedback.
All reactions