You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While helping merge #7480, I noticed a limitation of the simple scalar function interface with custom types. The limitation is that when a function supports both argument types of both the custom type and its underlying type, the call methods for the two types must be implemented in two separate function classes. The reason is that the call method only sees the native argument types that are the same for the custom type and its underlying type. Below is an example.
The EqFunction class defined in Velox implements three call methods: one for template type T, one for TimestampWithTimezone, and one for Generic. When EqFunction is registered via registerFunction<T, bool, int64_t, int64_t>("eq") (which is not the case with velox_functions_prestosql, but is the case with an internal Velox user), and invoked through an expression like eq(12345, 54321), void call(bool& result, const arg_type<TimestampWithTimezone>& lhs, const arg_type<TimestampWithTimezone>& rhs) is incorrectly called for it. This is because #7480 make TimestampWithTimezone type backed by BIGINT and hence arg_type<TimestampWithTimezone> become int64_t. Although EqFunction intends to handle BIGINT comparison through the templated call method, the specialized version is preferred (https://en.cppreference.com/w/cpp/language/partial_specialization).
In general, this is a limitation of the simple scalar function interface that it cannot differentiate custom type and its underlying type because their arg_type<> are the same. There are two workaround to address this limitation:
Define EqFunction that handles BIGINT through the templated call method and EqFunctionTimestampWithTimezone that handles TimestampWithTimezone separately in two classes.
Put the comparison logic for BIGINT and TimestampWithTimezone both in bool call(bool& out, const arg_type<Generic<T1>>& lhs, const arg_type<Generic<T1>>& rhs). Here, arg_type<Generic> is GenericView type that provide a type() method to retrieve the actual type of the input argument vector.
Another Bug
A related bug about eq(json, json) was found during the investigation. In Presto, SELECT json '[1,2,3]' = json '[1,2, 3]' returns true. One the other hand, the unit test below in Velox fails because Velox's EqFunction doesn't differentiate JSON and VARCHAR.
TEST_F(ExprTest, jsonCompare) {
auto data = makeRowVector(
{makeFlatVector<StringView>({"[1,2,3]"}, JSON()),
makeFlatVector<StringView>({"[1,2, 3]"}, JSON())});
auto actual = evaluate("eq(c0, c1)", data);
assertEqualVectors(makeFlatVector<bool>(std::vector<bool>{true}), actual);
}
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
While helping merge #7480, I noticed a limitation of the simple scalar function interface with custom types. The limitation is that when a function supports both argument types of both the custom type and its underlying type, the
call
methods for the two types must be implemented in two separate function classes. The reason is that thecall
method only sees the native argument types that are the same for the custom type and its underlying type. Below is an example.velox/velox/functions/prestosql/Comparisons.h
Lines 106 to 139 in 98c805a
The
EqFunction
class defined in Velox implements threecall
methods: one for template type T, one for TimestampWithTimezone, and one for Generic. When EqFunction is registered viaregisterFunction<T, bool, int64_t, int64_t>("eq")
(which is not the case with velox_functions_prestosql, but is the case with an internal Velox user), and invoked through an expression likeeq(12345, 54321)
,void call(bool& result, const arg_type<TimestampWithTimezone>& lhs, const arg_type<TimestampWithTimezone>& rhs)
is incorrectly called for it. This is because #7480 make TimestampWithTimezone type backed by BIGINT and hencearg_type<TimestampWithTimezone>
becomeint64_t
. Although EqFunction intends to handle BIGINT comparison through the templated call method, the specialized version is preferred (https://en.cppreference.com/w/cpp/language/partial_specialization).In general, this is a limitation of the simple scalar function interface that it cannot differentiate custom type and its underlying type because their arg_type<> are the same. There are two workaround to address this limitation:
bool call(bool& out, const arg_type<Generic<T1>>& lhs, const arg_type<Generic<T1>>& rhs)
. Here, arg_type<Generic> is GenericView type that provide a type() method to retrieve the actual type of the input argument vector.Another Bug
A related bug about eq(json, json) was found during the investigation. In Presto,
SELECT json '[1,2,3]' = json '[1,2, 3]'
returnstrue
. One the other hand, the unit test below in Velox fails because Velox's EqFunction doesn't differentiate JSON and VARCHAR.Beta Was this translation helpful? Give feedback.
All reactions