Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-11950: [C++][Compute] Add unary negative kernel #10016

Closed
wants to merge 0 commits into from

Conversation

edponce
Copy link
Contributor

@edponce edponce commented Apr 13, 2021

This draft PR adds unary scalar arithmetic kernels for the negation operation on integral and floating-point types. The kernels are described in the compute package as Negate and NegateChecked structs, and registered with respective names of "negate" and "negate_checked".

@bkietz please review

@edponce
Copy link
Contributor Author

edponce commented Apr 13, 2021

The following are pending details to be resolved with this PR:

  1. How to handle 0, +0, -0?
    • IEEE754 defines signed/unsigned FP zero and although they should be logically equal, they can produce different results on certain operations. For example, 1/-0 = -inf and 1/+0 = +inf.
    • Integral signed/unsigned zero?
  2. How to handle unsigned integers?
  3. Test cases for int8 and int16 fail because expected result is implicitly promoted to int32. Not sure if this promotion occurs in testing framework or C++ rules.

@bkietz @pitrou

@github-actions
Copy link

@bkietz bkietz self-requested a review April 14, 2021 15:32
Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this is looking good. Please add tests for negate_checked

@@ -739,6 +739,10 @@ struct ScalarUnaryNotNull {
}
};

// A kernel exec generator for unary kernels
template <typename OutType, typename ArgType, typename Op>
using ScalarUnaryType = ScalarUnary<OutType, ArgType, Op>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This alias doesn't add anything, please revert it.

Comment on lines 260 to 265
T result = 0;
// NOTE [EPM]: Check this edge case of overflow. What are we trying to check here?
if (ARROW_PREDICT_FALSE(SubtractWithOverflow(0, arg, &result))) {
ctx->SetStatus(Status::Invalid("overflow"));
}
return result;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
T result = 0;
// NOTE [EPM]: Check this edge case of overflow. What are we trying to check here?
if (ARROW_PREDICT_FALSE(SubtractWithOverflow(0, arg, &result))) {
ctx->SetStatus(Status::Invalid("overflow"));
}
return result;
if (arg == std::numeric_limits<T>::min()) {
// two's complement can represent a negative number which has no corresponding positive,
// for example int8_t(-128) cannot be negated since 128 is not respresentable in int8_t
ctx->SetStatus(Status::Invalid("overflow"));
return 0;
}
return -arg;


struct NegateChecked {
template <typename T, typename Arg0>
static enable_if_integer<T> Call(KernelContext* ctx, Arg0 arg) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static enable_if_integer<T> Call(KernelContext* ctx, Arg0 arg) {
static enable_if_signed_integer<T> Call(KernelContext* ctx, Arg0 arg) {

@@ -233,6 +235,43 @@ struct DivideChecked {
}
};

struct Negate {
template <typename T, typename Arg0>
// NOTE [EPM]: Discuss on 0 vs. -0.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that it's not the negate kernel's responsibility to coerce -0 to 0.
For follow up work: it might be useful to have another kernel which normalizes floating point values by replacing NaNs with nulls, ensuring only positive 0s, etc

Suggested change
// NOTE [EPM]: Discuss on 0 vs. -0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is nothing to coerce indeed, the FPU should do its job correctly.

}

// NOTE [EPM]: How to handle unsigned integers?
// * Promote to signed?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think promotion to signed is the correct way to handle this. Only kernels for signed integer types will be included, and when negating an unsigned integer an implicit cast to the next largest signed integer must be performed first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, numpy preserves the dtype for unsigned integers:

In [14]: arr = np.array([0,  255], dtype="uint8")

In [15]: -arr
Out[15]: array([0, 1], dtype=uint8)

In [16]: np.negative(arr)
Out[16]: array([0, 1], dtype=uint8)

(not sure that's very useful, though)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After careful deliberation on this topic, I think negate should preserve data type. Also, in a mathematical context, negation is not supported for unsigned integrals, so I do not think kernels should be available for the "checked" kernels. For default kernels behavior is to wrap around (apply two's complement in a safe manner).

// Null input
CheckScalarUnary("negate", ArrayFromJSON(ty, "[null]"), ArrayFromJSON(ty, "[null]"));
// Zeros as inputs
CheckScalarUnary("negate", ArrayFromJSON(ty, "[0, 0, -0]"), ArrayFromJSON(ty, "[0, -0, 0]"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-0 is not distinct from 0 for integral types

Suggested change
CheckScalarUnary("negate", ArrayFromJSON(ty, "[0, 0, -0]"), ArrayFromJSON(ty, "[0, -0, 0]"));
CheckScalarUnary("negate", ArrayFromJSON(ty, "[0, 0, 0]"), ArrayFromJSON(ty, "[0, 0, 0]"));

Comment on lines 742 to 746
// NOTE [EPM]: Why do these fail? The expected result is promoted to int32.
// auto int8_max = std::numeric_limits<int8_t>::max();
// CheckScalarUnary("negate", MakeScalar(int8_max), MakeScalar(-int8_max));
// auto int16_max = std::numeric_limits<int16_t>::max();
// CheckScalarUnary("negate", MakeScalar(int16_max), MakeScalar(-int16_max));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MakeScalar decides the DataType of the scalar based on its argument type, and decltype(-int8_max) is 32 bit signed integer. Adding an explicit cast to 8 bit should fix it

Suggested change
// NOTE [EPM]: Why do these fail? The expected result is promoted to int32.
// auto int8_max = std::numeric_limits<int8_t>::max();
// CheckScalarUnary("negate", MakeScalar(int8_max), MakeScalar(-int8_max));
// auto int16_max = std::numeric_limits<int16_t>::max();
// CheckScalarUnary("negate", MakeScalar(int16_max), MakeScalar(-int16_max));
auto int8_max = std::numeric_limits<int8_t>::max();
CheckScalarUnary("negate", MakeScalar(int8_max), MakeScalar(static_cast<int8_t>(-int8_max)));
auto int16_max = std::numeric_limits<int16_t>::max();
CheckScalarUnary("negate", MakeScalar(int16_max), MakeScalar(static_cast<int16_t>(-int16_max)));

@@ -309,6 +348,21 @@ std::shared_ptr<ScalarFunction> MakeArithmeticFunctionNotNull(std::string name,
return func;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Insertion of implicit casts is accomplished by overriding Function::DispatchBest. For example, to ensure that unsigned types are supported by casting to a compatible unsigned type, use:

Suggested change
struct UnaryArithmeticFunction : ScalarFunction {
using ScalarFunction::ScalarFunction;
Result<const Kernel*> DispatchBest(std::vector<ValueDescr>* values) const override {
RETURN_NOT_OK(CheckArity(*values));
using arrow::compute::detail::DispatchExactImpl;
if (auto kernel = DispatchExactImpl(this, *values)) return kernel;
EnsureDictionaryDecoded(values);
if (auto type = CommonNumeric({values->at(0), int8()})) {
ReplaceTypes(type, values);
}
if (auto kernel = DispatchExactImpl(this, *values)) return kernel;
return arrow::compute::detail::NoMatchingKernel(this, *values);
}
};

(UnaryScalarFunction will replace ScalarFunction below in auto func = std::make_shared<ScalarFunction>(name, Arity::Unary(), doc);)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why we need UnaryScalarFunction and can't use ScalarFunction as is. Why the CommonNumeric is using int8()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ScalarFunction does not provide implicit casts, such as from unsigned to signed integers. UnaryScalarFunction is provided to add implicit casts including:

uint8 -> int16
uint16 -> int32
uint32 -> int64
uint64 -> int64
dictionary<int32, float> -> float
//...

The call to CommonNumeric with int8 ensures that the output type is signed, with no more widening than necessary. Insertion of implicit casts is tested for the other arithmetic functions using CheckDispatchBest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, nice trick!

CheckScalarUnary("negate", ArrayFromJSON(ty, "[null]"), ArrayFromJSON(ty, "[null]"));
// Zeros as inputs
CheckScalarUnary("negate", ArrayFromJSON(ty, "[0]"), ArrayFromJSON(ty, "[0]"));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please flesh these out

// * Use C++ integral conversions (e.g., Negate(-128) = -128)?
// * https://timsong-cpp.github.io/cppwp/n4659/conv.integral
template <typename T, typename Arg0>
static constexpr enable_if_integer<T> Call(KernelContext*, Arg0 arg) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static constexpr enable_if_integer<T> Call(KernelContext*, Arg0 arg) {
static constexpr enable_if_signed_integer<T> Call(KernelContext*, Arg0 arg) {

// Positive inputs
CheckScalarUnary("negate", ArrayFromJSON(ty, "[1.3, 10.80, 12748.001]"), ArrayFromJSON(ty, "[-1.3, -10.80, -12748.001]"));
// Negative inputs
CheckScalarUnary("negate", ArrayFromJSON(ty, "[-1.3, -10.80, -12748.001]"), ArrayFromJSON(ty, "[1.3, 10.80, 12748.001]"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please check inf and NaN (they should work implicitly, but who knows).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good corner cases, thanks!

(and dictionary decoded, if applicable) before the operation is applied.

The default variant of these functions does not detect overflow (the result
then typically wraps around). Each function is also available in an
overflow-checking variant, suffixed ``_checked``, which returns
an ``Invalid`` :class:`Status` when overflow is detected.

For a unary operation, should unsigned integer types be promoted as if in a
binary operation with ``int8``? This would at least ensure narrowest possible
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't add questions to the documentation. The documentation is meant to inform users, not to collect TODOs for development.

| negate | Unary | Numeric | Numeric |
+--------------------------+------------+--------------------+---------------------+
| negate_checked | Unary | Numeric | Numeric |
+--------------------------+------------+--------------------+---------------------+
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tables are alphabetically-ordered, it would be nice to keep them like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants