Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-13096: [C++] Implement logarithm compute functions #10567

Closed
wants to merge 4 commits into from

Conversation

lidavidm
Copy link
Member

Adds ln, log10, and log2. We could add a log1e and/or a logN if useful (probably not?)

Has some code from/will conflict with #10544.

@github-actions
Copy link

this->AssertUnaryOpRaises(Log10, "[0]", "divide by zero");
this->AssertUnaryOpRaises(Log2, "[0]", "divide by zero");
}

Copy link
Contributor

@edponce edponce Jun 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In valid test cases, no need to use ty (aka this->type_singleton()) with ArrayFromJSON because that is the default type.

Moreover, PR #10395 extends the unary scalar arithmetic test class to support combinations of JSON/Array inputs which will allow you to further simplify the test statements as follows:

  • For integer inputs: AssertUnaryOp(Ln, "[1]", ArrayFromJSON(float64(), "[0]"));
  • For floating point inputs: AssertUnaryOp(Log10, "[1, 10]", "[0, 1]");

Copy link
Contributor

@edponce edponce Jun 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add test cases with Inf, NaN, null, min, max inputs.

For min/max you can refer to the tests of AbsoluteValue.

| log2 | Unary | Numeric | Numeric |
+--------------------------+------------+--------------------+---------------------+
| log2_checked | Unary | Numeric | Numeric |
+--------------------------+------------+--------------------+---------------------+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to add a note stating that these functions return float64 value for integral inputs and same type as input for floating-point inputs.

}
return func;
}

Copy link
Contributor

@edponce edponce Jun 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to change the function name to MakeUnaryArithmeticFunctionWithFloatOutTypeNotNull.
Also, the _checked variants use the ScalarUnaryNotNull kernel exec generator but the regular variants use ScalarUnary. Need to add MakeUnaryArithmeticFunctionWithFloatOutType with same logic but using ScalarUnary.

@@ -485,6 +644,37 @@ ArrayKernelExec ArithmeticExecFromOp(detail::GetTypeId get_id) {
}
}

// For kernels that always return floating results
template <template <typename... Args> class KernelGenerator, typename Op>
ArrayKernelExec IntToDoubleExecFromOp(detail::GetTypeId get_id) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context: There are a variety of generator dispatchers in the compute layer and their names are inconsistent (this is indirectly related to ARROW-9161). There has been previous work in renaming them for consistency but looking at the codebase, we will need another pass.

I suggest to change the name IntToDoubleExecFromOp to GenerateArithmeticWithFloatOutType.

@lidavidm
Copy link
Member Author

@edponce thanks for the feedback! Since I also used these in #10544 I've made the changes there - if they look OK, then I'll backport them here. I had to do a little refactoring since ScalarBinary(EqualTypes) didn't specify a template parameter list and that threw off the inference.

@lidavidm
Copy link
Member Author

I will rebase this once #10544 merges since there'll be more conflicts there anyways.

@lidavidm lidavidm marked this pull request as draft June 30, 2021 12:37
@lidavidm lidavidm force-pushed the arrow-13096 branch 2 times, most recently from 106f952 to 40a20b1 Compare June 30, 2021 17:18
@lidavidm lidavidm marked this pull request as ready for review June 30, 2021 17:21
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Just some small comments.

/// \brief Get the natural log of a value. Array values can be of arbitrary
/// length. If argument is null the result will be null.
///
/// \param[in] arg the value transformed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "transformed"?

@@ -396,6 +396,52 @@ Result<Datum> Atan(const Datum& arg, ExecContext* ctx = NULLPTR);
ARROW_EXPORT
Result<Datum> Atan2(const Datum& y, const Datum& x, ExecContext* ctx = NULLPTR);

/// \brief Get the natural log of a value. Array values can be of arbitrary
/// length. If argument is null the result will be null.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that "Array values can be of arbitrary length" is a useful mention. It's normally true of all scalar (elemen-wise) functions.

Also, can we keep the \brief sentence a single one-liner, and put the description after a newline?

static enable_if_floating_point<Arg, T> Call(KernelContext*, Arg arg, Status* st) {
static_assert(std::is_same<T, Arg>::value, "");
if (arg == 0.0) {
*st = Status::Invalid("divide by zero");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if that's the best error message. Perhaps "logarithm of zero"?

*st = Status::Invalid("divide by zero");
return arg;
} else if (arg < 0.0) {
*st = Status::Invalid("domain error");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps something more precise, e.g. "logarithm of negative number"?

}
return std::log2(arg);
}
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if that's worth it, but these three kernels have very similar implementations, maybe something could be shared. Or perhaps that's pointless generalization.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could make it templated on two functions (one for float, one for double)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, or on a struct defining those two functions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to do it or not, in any case. This can also be merged as-is :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it'll save us very much here (and our arithmetic functions are all written out instead of being templated).

@@ -1295,6 +1407,60 @@ const FunctionDoc atan2_doc{
"Compute the inverse tangent using argument signs to determine the quadrant",
("Integer arguments return double values."),
{"y", "x"}};

const FunctionDoc ln_doc{
"Take natural log of arguments element-wise",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use "Compute" above.

this->AssertUnaryOpRaises(Log1p, "[-2]", "domain error");
this->AssertUnaryOpRaises(Log1p, "[-Inf]", "domain error");
this->AssertUnaryOpRaises(Log1p, MakeArray(std::numeric_limits<CType>::lowest()),
"domain error");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check the error cases for the non-checked variants as well?

@pitrou
Copy link
Member

pitrou commented Jul 1, 2021

Hmm, there's a genuine CI failure here:
https://github.com/apache/arrow/pull/10567/checks?check_run_id=2954826698

@lidavidm
Copy link
Member Author

lidavidm commented Jul 2, 2021

I've extended the test cases and fixed the CI failure (by avoiding bouncing through RapidJSON for the failing assertions).

Copy link
Contributor

@cyb70289 cyb70289 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants