-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-11950: [C++][Compute] Add unary negative kernel #10016
Conversation
The following are pending details to be resolved with this PR:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this is looking good. Please add tests for negate_checked
@@ -739,6 +739,10 @@ struct ScalarUnaryNotNull { | |||
} | |||
}; | |||
|
|||
// A kernel exec generator for unary kernels | |||
template <typename OutType, typename ArgType, typename Op> | |||
using ScalarUnaryType = ScalarUnary<OutType, ArgType, Op>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This alias doesn't add anything, please revert it.
T result = 0; | ||
// NOTE [EPM]: Check this edge case of overflow. What are we trying to check here? | ||
if (ARROW_PREDICT_FALSE(SubtractWithOverflow(0, arg, &result))) { | ||
ctx->SetStatus(Status::Invalid("overflow")); | ||
} | ||
return result; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
T result = 0; | |
// NOTE [EPM]: Check this edge case of overflow. What are we trying to check here? | |
if (ARROW_PREDICT_FALSE(SubtractWithOverflow(0, arg, &result))) { | |
ctx->SetStatus(Status::Invalid("overflow")); | |
} | |
return result; | |
if (arg == std::numeric_limits<T>::min()) { | |
// two's complement can represent a negative number which has no corresponding positive, | |
// for example int8_t(-128) cannot be negated since 128 is not respresentable in int8_t | |
ctx->SetStatus(Status::Invalid("overflow")); | |
return 0; | |
} | |
return -arg; |
|
||
struct NegateChecked { | ||
template <typename T, typename Arg0> | ||
static enable_if_integer<T> Call(KernelContext* ctx, Arg0 arg) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static enable_if_integer<T> Call(KernelContext* ctx, Arg0 arg) { | |
static enable_if_signed_integer<T> Call(KernelContext* ctx, Arg0 arg) { |
@@ -233,6 +235,43 @@ struct DivideChecked { | |||
} | |||
}; | |||
|
|||
struct Negate { | |||
template <typename T, typename Arg0> | |||
// NOTE [EPM]: Discuss on 0 vs. -0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say that it's not the negate kernel's responsibility to coerce -0 to 0.
For follow up work: it might be useful to have another kernel which normalizes floating point values by replacing NaNs with nulls, ensuring only positive 0s, etc
// NOTE [EPM]: Discuss on 0 vs. -0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is nothing to coerce indeed, the FPU should do its job correctly.
} | ||
|
||
// NOTE [EPM]: How to handle unsigned integers? | ||
// * Promote to signed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think promotion to signed is the correct way to handle this. Only kernels for signed integer types will be included, and when negating an unsigned integer an implicit cast to the next largest signed integer must be performed first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reference, numpy preserves the dtype for unsigned integers:
In [14]: arr = np.array([0, 255], dtype="uint8")
In [15]: -arr
Out[15]: array([0, 1], dtype=uint8)
In [16]: np.negative(arr)
Out[16]: array([0, 1], dtype=uint8)
(not sure that's very useful, though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After careful deliberation on this topic, I think negate should preserve data type. Also, in a mathematical context, negation is not supported for unsigned integrals, so I do not think kernels should be available for the "checked" kernels. For default kernels behavior is to wrap around (apply two's complement in a safe manner).
// Null input | ||
CheckScalarUnary("negate", ArrayFromJSON(ty, "[null]"), ArrayFromJSON(ty, "[null]")); | ||
// Zeros as inputs | ||
CheckScalarUnary("negate", ArrayFromJSON(ty, "[0, 0, -0]"), ArrayFromJSON(ty, "[0, -0, 0]")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-0 is not distinct from 0 for integral types
CheckScalarUnary("negate", ArrayFromJSON(ty, "[0, 0, -0]"), ArrayFromJSON(ty, "[0, -0, 0]")); | |
CheckScalarUnary("negate", ArrayFromJSON(ty, "[0, 0, 0]"), ArrayFromJSON(ty, "[0, 0, 0]")); |
// NOTE [EPM]: Why do these fail? The expected result is promoted to int32. | ||
// auto int8_max = std::numeric_limits<int8_t>::max(); | ||
// CheckScalarUnary("negate", MakeScalar(int8_max), MakeScalar(-int8_max)); | ||
// auto int16_max = std::numeric_limits<int16_t>::max(); | ||
// CheckScalarUnary("negate", MakeScalar(int16_max), MakeScalar(-int16_max)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MakeScalar decides the DataType of the scalar based on its argument type, and decltype(-int8_max)
is 32 bit signed integer. Adding an explicit cast to 8 bit should fix it
// NOTE [EPM]: Why do these fail? The expected result is promoted to int32. | |
// auto int8_max = std::numeric_limits<int8_t>::max(); | |
// CheckScalarUnary("negate", MakeScalar(int8_max), MakeScalar(-int8_max)); | |
// auto int16_max = std::numeric_limits<int16_t>::max(); | |
// CheckScalarUnary("negate", MakeScalar(int16_max), MakeScalar(-int16_max)); | |
auto int8_max = std::numeric_limits<int8_t>::max(); | |
CheckScalarUnary("negate", MakeScalar(int8_max), MakeScalar(static_cast<int8_t>(-int8_max))); | |
auto int16_max = std::numeric_limits<int16_t>::max(); | |
CheckScalarUnary("negate", MakeScalar(int16_max), MakeScalar(static_cast<int16_t>(-int16_max))); |
@@ -309,6 +348,21 @@ std::shared_ptr<ScalarFunction> MakeArithmeticFunctionNotNull(std::string name, | |||
return func; | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Insertion of implicit casts is accomplished by overriding Function::DispatchBest. For example, to ensure that unsigned types are supported by casting to a compatible unsigned type, use:
struct UnaryArithmeticFunction : ScalarFunction { | |
using ScalarFunction::ScalarFunction; | |
Result<const Kernel*> DispatchBest(std::vector<ValueDescr>* values) const override { | |
RETURN_NOT_OK(CheckArity(*values)); | |
using arrow::compute::detail::DispatchExactImpl; | |
if (auto kernel = DispatchExactImpl(this, *values)) return kernel; | |
EnsureDictionaryDecoded(values); | |
if (auto type = CommonNumeric({values->at(0), int8()})) { | |
ReplaceTypes(type, values); | |
} | |
if (auto kernel = DispatchExactImpl(this, *values)) return kernel; | |
return arrow::compute::detail::NoMatchingKernel(this, *values); | |
} | |
}; |
(UnaryScalarFunction will replace ScalarFunction below in auto func = std::make_shared<ScalarFunction>(name, Arity::Unary(), doc);
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why we need UnaryScalarFunction and can't use ScalarFunction as is. Why the CommonNumeric is using int8()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ScalarFunction
does not provide implicit casts, such as from unsigned to signed integers. UnaryScalarFunction
is provided to add implicit casts including:
uint8 -> int16
uint16 -> int32
uint32 -> int64
uint64 -> int64
dictionary<int32, float> -> float
//...
The call to CommonNumeric with int8 ensures that the output type is signed, with no more widening than necessary. Insertion of implicit casts is tested for the other arithmetic functions using CheckDispatchBest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, nice trick!
CheckScalarUnary("negate", ArrayFromJSON(ty, "[null]"), ArrayFromJSON(ty, "[null]")); | ||
// Zeros as inputs | ||
CheckScalarUnary("negate", ArrayFromJSON(ty, "[0]"), ArrayFromJSON(ty, "[0]")); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please flesh these out
// * Use C++ integral conversions (e.g., Negate(-128) = -128)? | ||
// * https://timsong-cpp.github.io/cppwp/n4659/conv.integral | ||
template <typename T, typename Arg0> | ||
static constexpr enable_if_integer<T> Call(KernelContext*, Arg0 arg) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static constexpr enable_if_integer<T> Call(KernelContext*, Arg0 arg) { | |
static constexpr enable_if_signed_integer<T> Call(KernelContext*, Arg0 arg) { |
// Positive inputs | ||
CheckScalarUnary("negate", ArrayFromJSON(ty, "[1.3, 10.80, 12748.001]"), ArrayFromJSON(ty, "[-1.3, -10.80, -12748.001]")); | ||
// Negative inputs | ||
CheckScalarUnary("negate", ArrayFromJSON(ty, "[-1.3, -10.80, -12748.001]"), ArrayFromJSON(ty, "[1.3, 10.80, 12748.001]")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also please check inf
and NaN
(they should work implicitly, but who knows).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good corner cases, thanks!
docs/source/cpp/compute.rst
Outdated
(and dictionary decoded, if applicable) before the operation is applied. | ||
|
||
The default variant of these functions does not detect overflow (the result | ||
then typically wraps around). Each function is also available in an | ||
overflow-checking variant, suffixed ``_checked``, which returns | ||
an ``Invalid`` :class:`Status` when overflow is detected. | ||
|
||
For a unary operation, should unsigned integer types be promoted as if in a | ||
binary operation with ``int8``? This would at least ensure narrowest possible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't add questions to the documentation. The documentation is meant to inform users, not to collect TODOs for development.
docs/source/cpp/compute.rst
Outdated
| negate | Unary | Numeric | Numeric | | ||
+--------------------------+------------+--------------------+---------------------+ | ||
| negate_checked | Unary | Numeric | Numeric | | ||
+--------------------------+------------+--------------------+---------------------+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tables are alphabetically-ordered, it would be nice to keep them like that.
This draft PR adds unary scalar arithmetic kernels for the negation operation on integral and floating-point types. The kernels are described in the compute package as Negate and NegateChecked structs, and registered with respective names of "negate" and "negate_checked".
@bkietz please review