Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-13095: [C++] Implement trig compute functions #10544

Closed
wants to merge 1 commit into from

Conversation

lidavidm
Copy link
Member

Adds sin/cos/tan and their inverses. Checked variants check for what would be domain errors (this does not apply to atan/atan2).

@github-actions
Copy link


TRIG_NO_INF(Sin, std::sin);
TRIG_NO_INF(Cos, std::cos);
TRIG_NO_INF(Tan, std::tan);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to not use macros for defining compute functions and instead expand the definition of each function explicitly. I understand it is to reduce code duplication, but the same case could be applied to kernel registration code and others.

return std::atan2(y, x);
}
};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use the Trig prefix on these compute functions and not their standalone name?

static_assert(std::is_same<Arg0, Arg1>::value, "");
return std::atan2(y, x);
}
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we missing the TrigAtanChecked and TrigAtan2Checked?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going by cppreference which states that these functions don't have domain errors. The SEI CERT C coding standard disagrees in the case of atan2 (it states x and y must not be 0), but after digging up IEEE754, it states that atan2(0, 0) is defined (as 0).

These functions do have range (overflow) errors though so I can add checked versions for that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I take that back: looking at IEEE754 neither of them are specified to overflow. They can both underflow, but this isn't undefined behavior/I don't think we need to check for that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But do we want to add the ArithmeticOptions to the signature anyways for consistency?

Copy link
Contributor

@edponce edponce Jun 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++ defines atan2(0, 0) as a domain error, the returned value is implementation-specific, so even if it returns 0 in our tests we should trigger a domain error as it maps to 0/0 expression.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, probably there is no need to check for underflow, because it seems that for all trigonometric functions, C++ applies rounding to the underflow cases and returns a correct value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok. I do see that it'll set a floating-point exception that you can test for, but that may indeed be overkill.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've fixed the docs and updated atan2 to explicitly check for the 0, 0 case (including signed zeroes which are specified).

Copy link
Contributor

@edponce edponce Jun 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last fix, std::copysign should also be applied to 0.0 value, return std::copysign(static_cast<T>(0.0), y).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, good catch - thanks!

template <typename T, typename Arg0>
static enable_if_floating_point<Arg0, T> Call(KernelContext*, Arg0 val, Status* st) {
static_assert(std::is_same<T, Arg0>::value, "");
if (ARROW_PREDICT_FALSE((val < -1.0 || val > 1.0) && !std::isnan(val))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest reorder the domain error checks as follows:

if (!std::isnan(val) && (val > 1.0 || val < -1.0)) { ... }

The reasoning is that a NaN input will exit immediately and a non-NaN input will be correctly compared to |val| > 1 (C++ does not guarantees NaN comparisons to be false). Reordering the positive check before the negative check can make compiler reorder instructions more optimal, see examples of all the combinations for clang and GCC.

@edponce
Copy link
Contributor

edponce commented Jun 17, 2021

LGTM, thanks for working on this.

@edponce
Copy link
Contributor

edponce commented Jun 17, 2021

The Windows MinGW jobs are failing due to undefined M_PI. The reason is M_PI and friends are not part of the C++ standard. We need to add our own definitions to be portable with these definitions.

#if defined(_USE_MATH_DEFINES) && !defined(_MATH_DEFINES_DEFINED)
#define _MATH_DEFINES_DEFINED

/* Define _USE_MATH_DEFINES before including math.h to expose these macro
 * definitions for common math constants.  These are placed under an #ifdef
 * since these commonly-defined names are not part of the C/C++ standards.
 */

/* Definitions of useful mathematical constants
 * M_E        - e
 * M_LOG2E    - log2(e)
 * M_LOG10E   - log10(e)
 * M_LN2      - ln(2)
 * M_LN10     - ln(10)
 * M_PI       - pi
 * M_PI_2     - pi/2
 * M_PI_4     - pi/4
 * M_1_PI     - 1/pi
 * M_2_PI     - 2/pi
 * M_2_SQRTPI - 2/sqrt(pi)
 * M_SQRT2    - sqrt(2)
 * M_SQRT1_2  - 1/sqrt(2)
 */

#define M_E        2.71828182845904523536
#define M_LOG2E    1.44269504088896340736
#define M_LOG10E   0.434294481903251827651
#define M_LN2      0.693147180559945309417
#define M_LN10     2.30258509299404568402
#define M_PI       3.14159265358979323846
#define M_PI_2     1.57079632679489661923
#define M_PI_4     0.785398163397448309616
#define M_1_PI     0.318309886183790671538
#define M_2_PI     0.636619772367581343076
#define M_2_SQRTPI 1.12837916709551257390
#define M_SQRT2    1.41421356237309504880
#define M_SQRT1_2  0.707106781186547524401

#endif  /* _USE_MATH_DEFINES */

@lidavidm
Copy link
Member Author

Yup, I just saw that - I ended up using acos, I can add a define instead too.

@edponce
Copy link
Contributor

edponce commented Jun 17, 2021

I think it would be good to add these macros somewhere accessible for the compute layer. Should we open a JIRA for this or not consider this now?

@lidavidm
Copy link
Member Author

I can put them in util_internal.h perhaps?

@edponce
Copy link
Contributor

edponce commented Jun 17, 2021

I agree putting the math definitions in arrow/cpp/src/arrow/compute/kernels/util_internal.h, this will make them visible to arithmetic kernel implementations and tests.

#define M_SQRT2 1.41421356237309504880
#define M_SQRT1_2 0.707106781186547524401
#endif

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand the issue with _USE_MATH_DEFINES and unity builds. These defines should be guarded with

#if defined(_USE_MATH_DEFINES) && !defined(_MATH_DEFINES_DEFINED)
#define _MATH_DEFINES_DEFINED
...
#endif

to prevent multiple definitions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the opposite: if any translation unit forgets to add that define before including cmath, even if you don't forget, you'll get a build error, since cmath will already have been included.

Copy link
Contributor

@edponce edponce Jun 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see. You are guarding all the defines based on the existence of M_E. I think it would be "safer" to guard each macro independently. Safer in the sense that if at some point only M_E is defined explicitly in another translation unit, then the remaining macros will be undefined. Alternatively, we can use _USE_ARROW_MATH_DEFINES and require it before including util_internal.h, but this seems overkill.

| add | Binary | Numeric | Numeric (1) |
+--------------------------+------------+--------------------+---------------------+
| add_checked | Binary | Numeric | Numeric (1) |
+--------------------------+------------+--------------------+---------------------+
| asin | Unary | Numeric | Numeric |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put then in a separate table with the trigonometric kernels grouped (IMO that will be easier to keep somewhat the overview on this page)

Copy link
Contributor

@cyb70289 cyb70289 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@cyb70289
Copy link
Contributor

@edponce @jorisvandenbossche do you have other comments?

template <typename T, typename Arg0>
static enable_if_floating_point<Arg0, T> Call(KernelContext*, Arg0 val, Status*) {
static_assert(std::is_same<T, Arg0>::value, "");
return sin(val);
Copy link
Contributor

@edponce edponce Jun 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use std::sin and friends to be consistent and be clear that this is not an Arrow function (or is it?).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, not sure why it even lets me call that without std::. Fixed.

static_assert(std::is_same<T, double>::value, "");
static_assert(std::is_same<Arg0, Arg1>::value, "");
// Explicitly mimic what IEEE754 does (atan(0, 0) == 0) if needed
if (!std::numeric_limits<T>::is_iec559 && y == 0 && x == 0) return 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After more thorough thinking, I consider it is safe to remove the "zero check" condition because 99% of CPUs adhere to 99% of IEEE 754, so based on the C++ standard the resulting value from atan2(0,0) will be correct. If you feel that we should keep it, I am fine with that too. Apologies for the back and forth on this detail.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've decided to remove it then. The tests will catch if a platform doesn't support that.


/// \brief Compute the inverse tangent (arctangent) of the array
/// values, using the argument signs to determine the correct
/// quadrant.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring should be more explicit about what y and x are for.

-M_PI_2, 0, M_PI));
}

TYPED_TEST(TestUnaryArithmeticIntegral, Trig) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this testing an implicitly cast version of the kernel? It doesn't seem very useful to compute trigonometric functions on integers.

-----------------------

Trigonometric functions are also supported, and also offer ``_checked``
variants which detect domain and range errors where appropriate.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are range errors actually detected?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are not, because none of these functions can raise a range error, except in case of underflow, in which case they're defined to return a correctly rounded result, so I'll revise the docs.

template <typename T, typename Arg0>
static enable_if_floating_point<Arg0, T> Call(KernelContext*, Arg0 val, Status* st) {
static_assert(std::is_same<T, Arg0>::value, "");
if (ARROW_PREDICT_FALSE(!std::isnan(val) && (val < -1.0 || val > 1.0))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the NaN check necessary? NaNs should typically fail those comparisons, AFAIU.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Eduardo points out above, C++ doesn't guarantee any particular behavior. But we're essentially already assuming IEE754 conformance here in which case the check is redundant.

@@ -454,6 +462,191 @@ struct PowerChecked {
}
};

struct Sin {
template <typename T, typename Arg0>
static enable_if_integer<Arg0, T> Call(KernelContext*, Arg0 val, Status*) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned elsewhere, I don't think it's useful to expose integer-accepting versions of these kernels.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone ahead and removed them.

@@ -820,6 +1081,80 @@ const FunctionDoc pow_checked_doc{
("An error is returned when integer to negative integer power is encountered,\n"
"or integer overflow is encountered."),
{"base", "exponent"}};

const FunctionDoc sin_doc{"Computes the sine of the elements argument-wise",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other docstrings use infinitive / imperative instead of present, such as "Compute the sine of the arguments element-wise".

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thank you @lidavidm

@pitrou pitrou closed this in 01f3338 Jun 30, 2021
cyb70289 pushed a commit that referenced this pull request Jul 7, 2021
Adds ln, log10, and log2. We could add a log1e and/or a logN if useful (probably not?)

Has some code from/will conflict with #10544.

Closes #10567 from lidavidm/arrow-13096

Authored-by: David Li <[email protected]>
Signed-off-by: Yibo Cai <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants