-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-13095: [C++] Implement trig compute functions #10544
Conversation
|
||
TRIG_NO_INF(Sin, std::sin); | ||
TRIG_NO_INF(Cos, std::cos); | ||
TRIG_NO_INF(Tan, std::tan); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to not use macros for defining compute functions and instead expand the definition of each function explicitly. I understand it is to reduce code duplication, but the same case could be applied to kernel registration code and others.
return std::atan2(y, x); | ||
} | ||
}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use the Trig prefix on these compute functions and not their standalone name?
static_assert(std::is_same<Arg0, Arg1>::value, ""); | ||
return std::atan2(y, x); | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we missing the TrigAtanChecked
and TrigAtan2Checked
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going by cppreference which states that these functions don't have domain errors. The SEI CERT C coding standard disagrees in the case of atan2 (it states x and y must not be 0), but after digging up IEEE754, it states that atan2(0, 0) is defined (as 0).
These functions do have range (overflow) errors though so I can add checked versions for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I take that back: looking at IEEE754 neither of them are specified to overflow. They can both underflow, but this isn't undefined behavior/I don't think we need to check for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But do we want to add the ArithmeticOptions to the signature anyways for consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C++ defines atan2(0, 0) as a domain error, the returned value is implementation-specific, so even if it returns 0 in our tests we should trigger a domain error as it maps to 0/0 expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, probably there is no need to check for underflow, because it seems that for all trigonometric functions, C++ applies rounding to the underflow cases and returns a correct value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok. I do see that it'll set a floating-point exception that you can test for, but that may indeed be overkill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've fixed the docs and updated atan2 to explicitly check for the 0, 0 case (including signed zeroes which are specified).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last fix, std::copysign
should also be applied to 0.0 value, return std::copysign(static_cast<T>(0.0), y)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, good catch - thanks!
template <typename T, typename Arg0> | ||
static enable_if_floating_point<Arg0, T> Call(KernelContext*, Arg0 val, Status* st) { | ||
static_assert(std::is_same<T, Arg0>::value, ""); | ||
if (ARROW_PREDICT_FALSE((val < -1.0 || val > 1.0) && !std::isnan(val))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest reorder the domain error checks as follows:
if (!std::isnan(val) && (val > 1.0 || val < -1.0)) { ... }
The reasoning is that a NaN input will exit immediately and a non-NaN input will be correctly compared to |val| > 1
(C++ does not guarantees NaN comparisons to be false). Reordering the positive check before the negative check can make compiler reorder instructions more optimal, see examples of all the combinations for clang and GCC.
LGTM, thanks for working on this. |
The Windows MinGW jobs are failing due to undefined
|
Yup, I just saw that - I ended up using acos, I can add a define instead too. |
I think it would be good to add these macros somewhere accessible for the compute layer. Should we open a JIRA for this or not consider this now? |
I can put them in util_internal.h perhaps? |
I agree putting the math definitions in |
#define M_SQRT2 1.41421356237309504880 | ||
#define M_SQRT1_2 0.707106781186547524401 | ||
#endif | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand the issue with _USE_MATH_DEFINES
and unity builds. These defines should be guarded with
#if defined(_USE_MATH_DEFINES) && !defined(_MATH_DEFINES_DEFINED)
#define _MATH_DEFINES_DEFINED
...
#endif
to prevent multiple definitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the opposite: if any translation unit forgets to add that define before including cmath, even if you don't forget, you'll get a build error, since cmath will already have been included.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see. You are guarding all the defines based on the existence of M_E
. I think it would be "safer" to guard each macro independently. Safer in the sense that if at some point only M_E
is defined explicitly in another translation unit, then the remaining macros will be undefined. Alternatively, we can use _USE_ARROW_MATH_DEFINES
and require it before including util_internal.h
, but this seems overkill.
docs/source/cpp/compute.rst
Outdated
| add | Binary | Numeric | Numeric (1) | | ||
+--------------------------+------------+--------------------+---------------------+ | ||
| add_checked | Binary | Numeric | Numeric (1) | | ||
+--------------------------+------------+--------------------+---------------------+ | ||
| asin | Unary | Numeric | Numeric | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put then in a separate table with the trigonometric kernels grouped (IMO that will be easier to keep somewhat the overview on this page)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@edponce @jorisvandenbossche do you have other comments? |
template <typename T, typename Arg0> | ||
static enable_if_floating_point<Arg0, T> Call(KernelContext*, Arg0 val, Status*) { | ||
static_assert(std::is_same<T, Arg0>::value, ""); | ||
return sin(val); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use std::sin
and friends to be consistent and be clear that this is not an Arrow function (or is it?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, not sure why it even lets me call that without std::. Fixed.
static_assert(std::is_same<T, double>::value, ""); | ||
static_assert(std::is_same<Arg0, Arg1>::value, ""); | ||
// Explicitly mimic what IEEE754 does (atan(0, 0) == 0) if needed | ||
if (!std::numeric_limits<T>::is_iec559 && y == 0 && x == 0) return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After more thorough thinking, I consider it is safe to remove the "zero check" condition because 99% of CPUs adhere to 99% of IEEE 754, so based on the C++ standard the resulting value from atan2(0,0)
will be correct. If you feel that we should keep it, I am fine with that too. Apologies for the back and forth on this detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've decided to remove it then. The tests will catch if a platform doesn't support that.
cpp/src/arrow/compute/api_scalar.h
Outdated
|
||
/// \brief Compute the inverse tangent (arctangent) of the array | ||
/// values, using the argument signs to determine the correct | ||
/// quadrant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This docstring should be more explicit about what y
and x
are for.
-M_PI_2, 0, M_PI)); | ||
} | ||
|
||
TYPED_TEST(TestUnaryArithmeticIntegral, Trig) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this testing an implicitly cast version of the kernel? It doesn't seem very useful to compute trigonometric functions on integers.
docs/source/python/api/compute.rst
Outdated
----------------------- | ||
|
||
Trigonometric functions are also supported, and also offer ``_checked`` | ||
variants which detect domain and range errors where appropriate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are range errors actually detected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are not, because none of these functions can raise a range error, except in case of underflow, in which case they're defined to return a correctly rounded result, so I'll revise the docs.
template <typename T, typename Arg0> | ||
static enable_if_floating_point<Arg0, T> Call(KernelContext*, Arg0 val, Status* st) { | ||
static_assert(std::is_same<T, Arg0>::value, ""); | ||
if (ARROW_PREDICT_FALSE(!std::isnan(val) && (val < -1.0 || val > 1.0))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the NaN check necessary? NaNs should typically fail those comparisons, AFAIU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Eduardo points out above, C++ doesn't guarantee any particular behavior. But we're essentially already assuming IEE754 conformance here in which case the check is redundant.
@@ -454,6 +462,191 @@ struct PowerChecked { | |||
} | |||
}; | |||
|
|||
struct Sin { | |||
template <typename T, typename Arg0> | |||
static enable_if_integer<Arg0, T> Call(KernelContext*, Arg0 val, Status*) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned elsewhere, I don't think it's useful to expose integer-accepting versions of these kernels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've gone ahead and removed them.
@@ -820,6 +1081,80 @@ const FunctionDoc pow_checked_doc{ | |||
("An error is returned when integer to negative integer power is encountered,\n" | |||
"or integer overflow is encountered."), | |||
{"base", "exponent"}}; | |||
|
|||
const FunctionDoc sin_doc{"Computes the sine of the elements argument-wise", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other docstrings use infinitive / imperative instead of present, such as "Compute the sine of the arguments element-wise".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, thank you @lidavidm
Adds ln, log10, and log2. We could add a log1e and/or a logN if useful (probably not?) Has some code from/will conflict with #10544. Closes #10567 from lidavidm/arrow-13096 Authored-by: David Li <[email protected]> Signed-off-by: Yibo Cai <[email protected]>
Adds sin/cos/tan and their inverses. Checked variants check for what would be domain errors (this does not apply to atan/atan2).