-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-13096: [C++] Implement logarithm compute functions #10567
Conversation
this->AssertUnaryOpRaises(Log10, "[0]", "divide by zero"); | ||
this->AssertUnaryOpRaises(Log2, "[0]", "divide by zero"); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In valid test cases, no need to use ty
(aka this->type_singleton()
) with ArrayFromJSON
because that is the default type.
Moreover, PR #10395 extends the unary scalar arithmetic test class to support combinations of JSON/Array inputs which will allow you to further simplify the test statements as follows:
- For integer inputs:
AssertUnaryOp(Ln, "[1]", ArrayFromJSON(float64(), "[0]"));
- For floating point inputs:
AssertUnaryOp(Log10, "[1, 10]", "[0, 1]");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add test cases with Inf
, NaN
, null
, min
, max
inputs.
For min/max you can refer to the tests of AbsoluteValue
.
docs/source/cpp/compute.rst
Outdated
| log2 | Unary | Numeric | Numeric | | ||
+--------------------------+------------+--------------------+---------------------+ | ||
| log2_checked | Unary | Numeric | Numeric | | ||
+--------------------------+------------+--------------------+---------------------+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to add a note stating that these functions return float64
value for integral inputs and same type as input for floating-point inputs.
} | ||
return func; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to change the function name to MakeUnaryArithmeticFunctionWithFloatOutTypeNotNull
.
Also, the _checked
variants use the ScalarUnaryNotNull
kernel exec generator but the regular variants use ScalarUnary
. Need to add MakeUnaryArithmeticFunctionWithFloatOutType
with same logic but using ScalarUnary
.
@@ -485,6 +644,37 @@ ArrayKernelExec ArithmeticExecFromOp(detail::GetTypeId get_id) { | |||
} | |||
} | |||
|
|||
// For kernels that always return floating results | |||
template <template <typename... Args> class KernelGenerator, typename Op> | |||
ArrayKernelExec IntToDoubleExecFromOp(detail::GetTypeId get_id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For context: There are a variety of generator dispatchers in the compute layer and their names are inconsistent (this is indirectly related to ARROW-9161). There has been previous work in renaming them for consistency but looking at the codebase, we will need another pass.
I suggest to change the name IntToDoubleExecFromOp
to GenerateArithmeticWithFloatOutType
.
I will rebase this once #10544 merges since there'll be more conflicts there anyways. |
106f952
to
40a20b1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Just some small comments.
cpp/src/arrow/compute/api_scalar.h
Outdated
/// \brief Get the natural log of a value. Array values can be of arbitrary | ||
/// length. If argument is null the result will be null. | ||
/// | ||
/// \param[in] arg the value transformed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why "transformed"?
cpp/src/arrow/compute/api_scalar.h
Outdated
@@ -396,6 +396,52 @@ Result<Datum> Atan(const Datum& arg, ExecContext* ctx = NULLPTR); | |||
ARROW_EXPORT | |||
Result<Datum> Atan2(const Datum& y, const Datum& x, ExecContext* ctx = NULLPTR); | |||
|
|||
/// \brief Get the natural log of a value. Array values can be of arbitrary | |||
/// length. If argument is null the result will be null. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that "Array values can be of arbitrary length" is a useful mention. It's normally true of all scalar (elemen-wise) functions.
Also, can we keep the \brief
sentence a single one-liner, and put the description after a newline?
static enable_if_floating_point<Arg, T> Call(KernelContext*, Arg arg, Status* st) { | ||
static_assert(std::is_same<T, Arg>::value, ""); | ||
if (arg == 0.0) { | ||
*st = Status::Invalid("divide by zero"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if that's the best error message. Perhaps "logarithm of zero"?
*st = Status::Invalid("divide by zero"); | ||
return arg; | ||
} else if (arg < 0.0) { | ||
*st = Status::Invalid("domain error"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps something more precise, e.g. "logarithm of negative number"?
} | ||
return std::log2(arg); | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if that's worth it, but these three kernels have very similar implementations, maybe something could be shared. Or perhaps that's pointless generalization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could make it templated on two functions (one for float, one for double)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps, or on a struct defining those two functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to do it or not, in any case. This can also be merged as-is :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it'll save us very much here (and our arithmetic functions are all written out instead of being templated).
@@ -1295,6 +1407,60 @@ const FunctionDoc atan2_doc{ | |||
"Compute the inverse tangent using argument signs to determine the quadrant", | |||
("Integer arguments return double values."), | |||
{"y", "x"}}; | |||
|
|||
const FunctionDoc ln_doc{ | |||
"Take natural log of arguments element-wise", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use "Compute" above.
this->AssertUnaryOpRaises(Log1p, "[-2]", "domain error"); | ||
this->AssertUnaryOpRaises(Log1p, "[-Inf]", "domain error"); | ||
this->AssertUnaryOpRaises(Log1p, MakeArray(std::numeric_limits<CType>::lowest()), | ||
"domain error"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we check the error cases for the non-checked variants as well?
Hmm, there's a genuine CI failure here: |
I've extended the test cases and fixed the CI failure (by avoiding bouncing through RapidJSON for the failing assertions). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Adds ln, log10, and log2. We could add a log1e and/or a logN if useful (probably not?)
Has some code from/will conflict with #10544.