ARROW-13096: [C++] Implement logarithm compute functions #10567

lidavidm · 2021-06-21T17:15:24Z

Adds ln, log10, and log2. We could add a log1e and/or a logN if useful (probably not?)

Has some code from/will conflict with #10544.

github-actions · 2021-06-21T17:15:47Z

https://issues.apache.org/jira/browse/ARROW-13096

edponce · 2021-06-22T06:16:50Z

cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc

+  this->AssertUnaryOpRaises(Log10, "[0]", "divide by zero");
+  this->AssertUnaryOpRaises(Log2, "[0]", "divide by zero");
+}
+


In valid test cases, no need to use ty (aka this->type_singleton()) with ArrayFromJSON because that is the default type.

Moreover, PR #10395 extends the unary scalar arithmetic test class to support combinations of JSON/Array inputs which will allow you to further simplify the test statements as follows:

For integer inputs: AssertUnaryOp(Ln, "[1]", ArrayFromJSON(float64(), "[0]"));

For floating point inputs: AssertUnaryOp(Log10, "[1, 10]", "[0, 1]");

Add test cases with Inf, NaN, null, min, max inputs.

For min/max you can refer to the tests of AbsoluteValue.

edponce · 2021-06-22T06:19:41Z

docs/source/cpp/compute.rst

+| log2                     | Unary      | Numeric            | Numeric             |
+--------------------------+------------+--------------------+---------------------+
+| log2_checked             | Unary      | Numeric            | Numeric             |
+--------------------------+------------+--------------------+---------------------+


I would suggest to add a note stating that these functions return float64 value for integral inputs and same type as input for floating-point inputs.

edponce · 2021-06-22T06:40:17Z

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc

+  }
+  return func;
+}
+


I suggest to change the function name to MakeUnaryArithmeticFunctionWithFloatOutTypeNotNull.
Also, the _checked variants use the ScalarUnaryNotNull kernel exec generator but the regular variants use ScalarUnary. Need to add MakeUnaryArithmeticFunctionWithFloatOutType with same logic but using ScalarUnary.

edponce · 2021-06-22T06:55:42Z

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc

@@ -485,6 +644,37 @@ ArrayKernelExec ArithmeticExecFromOp(detail::GetTypeId get_id) {
  }
 }

+// For kernels that always return floating results
+template <template <typename... Args> class KernelGenerator, typename Op>
+ArrayKernelExec IntToDoubleExecFromOp(detail::GetTypeId get_id) {


For context: There are a variety of generator dispatchers in the compute layer and their names are inconsistent (this is indirectly related to ARROW-9161). There has been previous work in renaming them for consistency but looking at the codebase, we will need another pass.

I suggest to change the name IntToDoubleExecFromOp to GenerateArithmeticWithFloatOutType.

lidavidm · 2021-06-22T13:54:25Z

@edponce thanks for the feedback! Since I also used these in #10544 I've made the changes there - if they look OK, then I'll backport them here. I had to do a little refactoring since ScalarBinary(EqualTypes) didn't specify a template parameter list and that threw off the inference.

lidavidm · 2021-06-30T12:36:55Z

I will rebase this once #10544 merges since there'll be more conflicts there anyways.

pitrou

Thank you! Just some small comments.

pitrou · 2021-07-01T16:38:03Z

cpp/src/arrow/compute/api_scalar.h

+/// \brief Get the natural log of a value. Array values can be of arbitrary
+/// length. If argument is null the result will be null.
+///
+/// \param[in] arg the value transformed


Why "transformed"?

pitrou · 2021-07-01T16:39:03Z

cpp/src/arrow/compute/api_scalar.h

@@ -396,6 +396,52 @@ Result<Datum> Atan(const Datum& arg, ExecContext* ctx = NULLPTR);
 ARROW_EXPORT
 Result<Datum> Atan2(const Datum& y, const Datum& x, ExecContext* ctx = NULLPTR);

+/// \brief Get the natural log of a value. Array values can be of arbitrary
+/// length. If argument is null the result will be null.


I'm not sure that "Array values can be of arbitrary length" is a useful mention. It's normally true of all scalar (elemen-wise) functions.

Also, can we keep the \brief sentence a single one-liner, and put the description after a newline?

pitrou · 2021-07-01T16:40:22Z

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc

+  static enable_if_floating_point<Arg, T> Call(KernelContext*, Arg arg, Status* st) {
+    static_assert(std::is_same<T, Arg>::value, "");
+    if (arg == 0.0) {
+      *st = Status::Invalid("divide by zero");


I don't know if that's the best error message. Perhaps "logarithm of zero"?

pitrou · 2021-07-01T16:40:44Z

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc

+      *st = Status::Invalid("divide by zero");
+      return arg;
+    } else if (arg < 0.0) {
+      *st = Status::Invalid("domain error");


Perhaps something more precise, e.g. "logarithm of negative number"?

pitrou · 2021-07-01T16:41:56Z

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc

+    }
+    return std::log2(arg);
+  }
+};


I don't know if that's worth it, but these three kernels have very similar implementations, maybe something could be shared. Or perhaps that's pointless generalization.

I could make it templated on two functions (one for float, one for double)?

Perhaps, or on a struct defining those two functions.

Feel free to do it or not, in any case. This can also be merged as-is :-)

I don't think it'll save us very much here (and our arithmetic functions are all written out instead of being templated).

pitrou · 2021-07-01T16:42:38Z

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc

@@ -1295,6 +1407,60 @@ const FunctionDoc atan2_doc{
    "Compute the inverse tangent using argument signs to determine the quadrant",
    ("Integer arguments return double values."),
    {"y", "x"}};
+
+const FunctionDoc ln_doc{
+    "Take natural log of arguments element-wise",


We use "Compute" above.

pitrou · 2021-07-01T16:44:25Z

cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc

+  this->AssertUnaryOpRaises(Log1p, "[-2]", "domain error");
+  this->AssertUnaryOpRaises(Log1p, "[-Inf]", "domain error");
+  this->AssertUnaryOpRaises(Log1p, MakeArray(std::numeric_limits<CType>::lowest()),
+                            "domain error");


Can we check the error cases for the non-checked variants as well?

pitrou · 2021-07-01T16:46:19Z

Hmm, there's a genuine CI failure here:
https://github.com/apache/arrow/pull/10567/checks?check_run_id=2954826698

lidavidm · 2021-07-02T13:14:18Z

I've extended the test cases and fixed the CI failure (by avoiding bouncing through RapidJSON for the failing assertions).

cyb70289

LGTM

github-actions bot added the Component: C++ label Jun 21, 2021

lidavidm force-pushed the arrow-13096 branch from d67c715 to 23b97e4 Compare June 21, 2021 17:31

edponce reviewed Jun 22, 2021

View reviewed changes

lidavidm force-pushed the arrow-13096 branch from 23b97e4 to f0fc54d Compare June 25, 2021 13:19

lidavidm marked this pull request as draft June 30, 2021 12:37

lidavidm force-pushed the arrow-13096 branch 2 times, most recently from 106f952 to 40a20b1 Compare June 30, 2021 17:18

lidavidm marked this pull request as ready for review June 30, 2021 17:21

pitrou requested changes Jul 1, 2021

View reviewed changes

lidavidm added 2 commits July 1, 2021 13:07

ARROW-13096: [C++] Implement logarithm compute functions

ff007b7

ARROW-13096: [C++] Improve error messages/tests

ccfb5fd

lidavidm force-pushed the arrow-13096 branch from 40a20b1 to ccfb5fd Compare July 1, 2021 17:14

lidavidm added 2 commits July 1, 2021 13:47

ARROW-13096: [C++] Fix other constant

9edae5e

ARROW-13096: [C++] Actually fix MinGW

46560ed

pitrou approved these changes Jul 5, 2021

View reviewed changes

cyb70289 approved these changes Jul 7, 2021

View reviewed changes

cyb70289 closed this in dfb0928 Jul 7, 2021

This was referenced Jul 15, 2021

[C++] Implement logarithm compute functions #28801

Closed

[C++] Log functions don't have int kernels #28967

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-13096: [C++] Implement logarithm compute functions #10567

ARROW-13096: [C++] Implement logarithm compute functions #10567

lidavidm commented Jun 21, 2021

github-actions bot commented Jun 21, 2021

edponce Jun 22, 2021 •

edited

Loading

edponce Jun 22, 2021 •

edited

Loading

edponce Jun 22, 2021

edponce Jun 22, 2021 •

edited

Loading

edponce Jun 22, 2021

lidavidm commented Jun 22, 2021

lidavidm commented Jun 30, 2021

pitrou left a comment

pitrou Jul 1, 2021

pitrou Jul 1, 2021

pitrou Jul 1, 2021

pitrou Jul 1, 2021

pitrou Jul 1, 2021

lidavidm Jul 1, 2021

pitrou Jul 5, 2021

pitrou Jul 5, 2021

lidavidm Jul 6, 2021

pitrou Jul 1, 2021

pitrou Jul 1, 2021

pitrou commented Jul 1, 2021

lidavidm commented Jul 2, 2021

cyb70289 left a comment

ARROW-13096: [C++] Implement logarithm compute functions #10567

ARROW-13096: [C++] Implement logarithm compute functions #10567

Conversation

lidavidm commented Jun 21, 2021

github-actions bot commented Jun 21, 2021

edponce Jun 22, 2021 • edited Loading

Choose a reason for hiding this comment

edponce Jun 22, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

edponce Jun 22, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lidavidm commented Jun 22, 2021

lidavidm commented Jun 30, 2021

pitrou left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pitrou commented Jul 1, 2021

lidavidm commented Jul 2, 2021

cyb70289 left a comment

Choose a reason for hiding this comment

edponce Jun 22, 2021 •

edited

Loading

edponce Jun 22, 2021 •

edited

Loading

edponce Jun 22, 2021 •

edited

Loading