ARROW-11950: [C++][Compute] Add unary negative kernel #10016

edponce · 2021-04-13T19:16:16Z

This draft PR adds unary scalar arithmetic kernels for the negation operation on integral and floating-point types. The kernels are described in the compute package as Negate and NegateChecked structs, and registered with respective names of "negate" and "negate_checked".

@bkietz please review

edponce · 2021-04-13T19:36:20Z

The following are pending details to be resolved with this PR:

How to handle 0, +0, -0?
- IEEE754 defines signed/unsigned FP zero and although they should be logically equal, they can produce different results on certain operations. For example, 1/-0 = -inf and 1/+0 = +inf.
- Integral signed/unsigned zero?
How to handle unsigned integers?
- Wrap around as described in https://en.cppreference.com/w/cpp/language/implicit_conversion
Test cases for int8 and int16 fail because expected result is implicitly promoted to int32. Not sure if this promotion occurs in testing framework or C++ rules.

@bkietz @pitrou

github-actions · 2021-04-14T01:42:55Z

https://issues.apache.org/jira/browse/ARROW-11950

bkietz

Overall this is looking good. Please add tests for negate_checked

bkietz · 2021-04-14T15:34:03Z

cpp/src/arrow/compute/kernels/codegen_internal.h

@@ -739,6 +739,10 @@ struct ScalarUnaryNotNull {
  }
 };

+// A kernel exec generator for unary kernels
+template <typename OutType, typename ArgType, typename Op>
+using ScalarUnaryType = ScalarUnary<OutType, ArgType, Op>;


This alias doesn't add anything, please revert it.

bkietz · 2021-04-14T15:43:56Z

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc

+    T result = 0;
+    // NOTE [EPM]: Check this edge case of overflow. What are we trying to check here?
+    if (ARROW_PREDICT_FALSE(SubtractWithOverflow(0, arg, &result))) {
+      ctx->SetStatus(Status::Invalid("overflow"));
+    }
+    return result;


Suggested change

T result = 0;

// NOTE [EPM]: Check this edge case of overflow. What are we trying to check here?

if (ARROW_PREDICT_FALSE(SubtractWithOverflow(0, arg, &result))) {

ctx->SetStatus(Status::Invalid("overflow"));

}

return result;

if (arg == std::numeric_limits<T>::min()) {

// two's complement can represent a negative number which has no corresponding positive,

// for example int8_t(-128) cannot be negated since 128 is not respresentable in int8_t

ctx->SetStatus(Status::Invalid("overflow"));

return 0;

}

return -arg;

bkietz · 2021-04-14T15:45:12Z

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc

+
+struct NegateChecked {
+  template <typename T, typename Arg0>
+  static enable_if_integer<T> Call(KernelContext* ctx, Arg0 arg) {


Suggested change

static enable_if_integer<T> Call(KernelContext* ctx, Arg0 arg) {

static enable_if_signed_integer<T> Call(KernelContext* ctx, Arg0 arg) {

bkietz · 2021-04-14T15:50:12Z

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc

@@ -233,6 +235,43 @@ struct DivideChecked {
  }
 };

+struct Negate {
+  template <typename T, typename Arg0>
+  // NOTE [EPM]: Discuss on 0 vs. -0.


I would say that it's not the negate kernel's responsibility to coerce -0 to 0.
For follow up work: it might be useful to have another kernel which normalizes floating point values by replacing NaNs with nulls, ensuring only positive 0s, etc

Suggested change

// NOTE [EPM]: Discuss on 0 vs. -0.

There is nothing to coerce indeed, the FPU should do its job correctly.

bkietz · 2021-04-14T15:57:46Z

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc

+  }
+
+  // NOTE [EPM]: How to handle unsigned integers?
+  //  * Promote to signed?


I think promotion to signed is the correct way to handle this. Only kernels for signed integer types will be included, and when negating an unsigned integer an implicit cast to the next largest signed integer must be performed first.

For reference, numpy preserves the dtype for unsigned integers:

In [14]: arr = np.array([0, 255], dtype="uint8") In [15]: -arr Out[15]: array([0, 1], dtype=uint8) In [16]: np.negative(arr) Out[16]: array([0, 1], dtype=uint8)

(not sure that's very useful, though)

After careful deliberation on this topic, I think negate should preserve data type. Also, in a mathematical context, negation is not supported for unsigned integrals, so I do not think kernels should be available for the "checked" kernels. For default kernels behavior is to wrap around (apply two's complement in a safe manner).

bkietz · 2021-04-14T16:23:01Z

cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc

+    // Null input
+    CheckScalarUnary("negate", ArrayFromJSON(ty, "[null]"), ArrayFromJSON(ty, "[null]"));
+    // Zeros as inputs
+    CheckScalarUnary("negate", ArrayFromJSON(ty, "[0, 0, -0]"), ArrayFromJSON(ty, "[0, -0, 0]"));


-0 is not distinct from 0 for integral types

Suggested change

CheckScalarUnary("negate", ArrayFromJSON(ty, "[0, 0, -0]"), ArrayFromJSON(ty, "[0, -0, 0]"));

CheckScalarUnary("negate", ArrayFromJSON(ty, "[0, 0, 0]"), ArrayFromJSON(ty, "[0, 0, 0]"));

bkietz · 2021-04-14T16:27:05Z

cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc

+  // NOTE [EPM]: Why do these fail? The expected result is promoted to int32.
+  // auto int8_max = std::numeric_limits<int8_t>::max();
+  // CheckScalarUnary("negate", MakeScalar(int8_max), MakeScalar(-int8_max));
+  // auto int16_max = std::numeric_limits<int16_t>::max();
+  // CheckScalarUnary("negate", MakeScalar(int16_max), MakeScalar(-int16_max));


MakeScalar decides the DataType of the scalar based on its argument type, and decltype(-int8_max) is 32 bit signed integer. Adding an explicit cast to 8 bit should fix it

Suggested change

// NOTE [EPM]: Why do these fail? The expected result is promoted to int32.

// auto int8_max = std::numeric_limits<int8_t>::max();

// CheckScalarUnary("negate", MakeScalar(int8_max), MakeScalar(-int8_max));

// auto int16_max = std::numeric_limits<int16_t>::max();

// CheckScalarUnary("negate", MakeScalar(int16_max), MakeScalar(-int16_max));

auto int8_max = std::numeric_limits<int8_t>::max();

CheckScalarUnary("negate", MakeScalar(int8_max), MakeScalar(static_cast<int8_t>(-int8_max)));

auto int16_max = std::numeric_limits<int16_t>::max();

CheckScalarUnary("negate", MakeScalar(int16_max), MakeScalar(static_cast<int16_t>(-int16_max)));

bkietz · 2021-04-14T16:33:23Z

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc

@@ -309,6 +348,21 @@ std::shared_ptr<ScalarFunction> MakeArithmeticFunctionNotNull(std::string name,
  return func;
 }



Insertion of implicit casts is accomplished by overriding Function::DispatchBest. For example, to ensure that unsigned types are supported by casting to a compatible unsigned type, use:

Suggested change

struct UnaryArithmeticFunction : ScalarFunction {

using ScalarFunction::ScalarFunction;

Result<const Kernel*> DispatchBest(std::vector<ValueDescr>* values) const override {

RETURN_NOT_OK(CheckArity(*values));

using arrow::compute::detail::DispatchExactImpl;

if (auto kernel = DispatchExactImpl(this, *values)) return kernel;

EnsureDictionaryDecoded(values);

if (auto type = CommonNumeric({values->at(0), int8()})) {

ReplaceTypes(type, values);

}

if (auto kernel = DispatchExactImpl(this, *values)) return kernel;

return arrow::compute::detail::NoMatchingKernel(this, *values);

}

};

(UnaryScalarFunction will replace ScalarFunction below in auto func = std::make_shared<ScalarFunction>(name, Arity::Unary(), doc);)

Not sure why we need UnaryScalarFunction and can't use ScalarFunction as is. Why the CommonNumeric is using int8()?

ScalarFunction does not provide implicit casts, such as from unsigned to signed integers. UnaryScalarFunction is provided to add implicit casts including:

uint8 -> int16 uint16 -> int32 uint32 -> int64 uint64 -> int64 dictionary<int32, float> -> float //...

The call to CommonNumeric with int8 ensures that the output type is signed, with no more widening than necessary. Insertion of implicit casts is tested for the other arithmetic functions using CheckDispatchBest

Got it, nice trick!

bkietz · 2021-04-14T16:35:05Z

cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc

+    CheckScalarUnary("negate", ArrayFromJSON(ty, "[null]"), ArrayFromJSON(ty, "[null]"));
+    // Zeros as inputs
+    CheckScalarUnary("negate", ArrayFromJSON(ty, "[0]"), ArrayFromJSON(ty, "[0]"));
+  }


Please flesh these out

bkietz · 2021-04-14T16:37:54Z

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc

+  //  * Use C++ integral conversions (e.g., Negate(-128) = -128)?
+  //    * https://timsong-cpp.github.io/cppwp/n4659/conv.integral
+  template <typename T, typename Arg0>
+  static constexpr enable_if_integer<T> Call(KernelContext*, Arg0 arg) {


Suggested change

static constexpr enable_if_integer<T> Call(KernelContext*, Arg0 arg) {

static constexpr enable_if_signed_integer<T> Call(KernelContext*, Arg0 arg) {

pitrou · 2021-04-14T17:03:03Z

cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc

+    // Positive inputs
+    CheckScalarUnary("negate", ArrayFromJSON(ty, "[1.3, 10.80, 12748.001]"), ArrayFromJSON(ty, "[-1.3, -10.80, -12748.001]"));
+    // Negative inputs
+    CheckScalarUnary("negate", ArrayFromJSON(ty, "[-1.3, -10.80, -12748.001]"), ArrayFromJSON(ty, "[1.3, 10.80, 12748.001]"));


Also please check inf and NaN (they should work implicitly, but who knows).

Good corner cases, thanks!

pitrou · 2021-04-14T17:03:48Z

docs/source/cpp/compute.rst

 (and dictionary decoded, if applicable) before the operation is applied.

 The default variant of these functions does not detect overflow (the result
 then typically wraps around).  Each function is also available in an
 overflow-checking variant, suffixed ``_checked``, which returns
 an ``Invalid`` :class:`Status` when overflow is detected.

+For a unary operation, should unsigned integer types be promoted as if in a
+binary operation with ``int8``? This would at least ensure narrowest possible


Please don't add questions to the documentation. The documentation is meant to inform users, not to collect TODOs for development.

pitrou · 2021-04-14T17:04:22Z

docs/source/cpp/compute.rst

+| negate                   | Unary      | Numeric            | Numeric             |
+--------------------------+------------+--------------------+---------------------+
+| negate_checked           | Unary      | Numeric            | Numeric             |
+--------------------------+------------+--------------------+---------------------+


These tables are alphabetically-ordered, it would be nice to keep them like that.

github-actions bot added the Component: C++ label Apr 14, 2021

bkietz self-requested a review April 14, 2021 15:32

bkietz requested changes Apr 14, 2021

View reviewed changes

pitrou reviewed Apr 14, 2021

View reviewed changes

bkietz marked this pull request as ready for review April 16, 2021 15:00

edponce closed this Apr 20, 2021

edponce force-pushed the master branch from b4806a0 to 930c381 Compare April 20, 2021 14:32

cyb70289 mentioned this pull request Apr 21, 2021

ARROW-11950: [C++][Compute] Add unary negative kernel #10113

Closed

asfimport mentioned this pull request May 12, 2021

[C++][Compute] Add unary negative kernel #27785

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-11950: [C++][Compute] Add unary negative kernel #10016

ARROW-11950: [C++][Compute] Add unary negative kernel #10016

edponce commented Apr 13, 2021

edponce commented Apr 13, 2021 •

edited

Loading

github-actions bot commented Apr 14, 2021

bkietz left a comment

bkietz Apr 14, 2021

bkietz Apr 14, 2021

bkietz Apr 14, 2021

bkietz Apr 14, 2021

pitrou Apr 14, 2021

bkietz Apr 14, 2021

jorisvandenbossche Apr 15, 2021

edponce Apr 15, 2021

bkietz Apr 14, 2021

bkietz Apr 14, 2021

bkietz Apr 14, 2021

edponce Apr 15, 2021

bkietz Apr 15, 2021

edponce Apr 15, 2021

bkietz Apr 14, 2021

bkietz Apr 14, 2021

pitrou Apr 14, 2021

edponce Apr 15, 2021

pitrou Apr 14, 2021

pitrou Apr 14, 2021

	static enable_if_integer<T> Call(KernelContext* ctx, Arg0 arg) {
	static enable_if_signed_integer<T> Call(KernelContext* ctx, Arg0 arg) {

	CheckScalarUnary("negate", ArrayFromJSON(ty, "[0, 0, -0]"), ArrayFromJSON(ty, "[0, -0, 0]"));
	CheckScalarUnary("negate", ArrayFromJSON(ty, "[0, 0, 0]"), ArrayFromJSON(ty, "[0, 0, 0]"));

		@@ -309,6 +348,21 @@ std::shared_ptr<ScalarFunction> MakeArithmeticFunctionNotNull(std::string name,
		return func;
		}

+struct UnaryArithmeticFunction : ScalarFunction {
+  using ScalarFunction::ScalarFunction;
+  Result<const Kernel*> DispatchBest(std::vector<ValueDescr>* values) const override {
+    RETURN_NOT_OK(CheckArity(*values));
+    using arrow::compute::detail::DispatchExactImpl;
+    if (auto kernel = DispatchExactImpl(this, *values)) return kernel;
+    EnsureDictionaryDecoded(values);
+    if (auto type = CommonNumeric({values->at(0), int8()})) {
+      ReplaceTypes(type, values);
+    }
+    if (auto kernel = DispatchExactImpl(this, *values)) return kernel;
+    return arrow::compute::detail::NoMatchingKernel(this, *values);
+  }
+};

	static constexpr enable_if_integer<T> Call(KernelContext*, Arg0 arg) {
	static constexpr enable_if_signed_integer<T> Call(KernelContext*, Arg0 arg) {

ARROW-11950: [C++][Compute] Add unary negative kernel #10016

ARROW-11950: [C++][Compute] Add unary negative kernel #10016

Conversation

edponce commented Apr 13, 2021

edponce commented Apr 13, 2021 • edited Loading

github-actions bot commented Apr 14, 2021

bkietz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

edponce commented Apr 13, 2021 •

edited

Loading