Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change reference_relaxed_exp to use float type to calculate reference #2160

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

anilavakundu007
Copy link

use lower precision for reference values used for the relaxed implementation of exp math built-in.

@CLAassistant
Copy link

CLAassistant commented Nov 26, 2024

CLA assistant check
All committers have signed the CLA.

double reference_relaxed_exp(double x) { return reference_exp(x); }
double reference_relaxed_exp(double x)
{
return reference_exp2(((float)x) * HEX_FLT(+, 1, 715476, +, 0));
Copy link
Contributor

@bashbaug bashbaug Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, could you please explain why this test should use a float to compute the reference value in this case?

In case it helps other reviewers, HEX_FLT(+, 1, 715476, +, 0) expands to + 0x1.715476p+0, which is equal to 1.44269f, or M_LOG2E_F.

Copy link
Author

@anilavakundu007 anilavakundu007 Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the ulp error for exp according to the specs is ≤ 3 ulp for the full profile and ≤ 4 ulp for embedded profile. However, with -cl-unsafe-math-optimizations flag turned on the ulp requirements become ≤ 3 + floor(fabs(2 * x)) ulp for the full profile, and ≤ 4 ulp for the embedded profile.

As the ulp error is a bit relaxed with the unsafe math operations is there a reason for the same reference values being used for both the relaxed and normal implementation of the exp built-in?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Altering the reference function doesn't feel like the right solution. unary_float.cpp already has logic to select the ULP threshold for exp and exp2 in relaxed mode:

                    if (strcmp(fname, "exp") == 0 || strcmp(fname, "exp2") == 0)
                    {
                        // For full profile, ULP depends on input value.
                        // For embedded profile, ULP comes from functionList.
                        if (!gIsEmbedded)
                        {
                            ulps = 3.0f + floor(fabs(2 * s[j]));
                        }

                        fail = !(fabsf(err) <= ulps);
                    }

Are you able to share any details on the values that are failing for you?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few failures but one example is the following
ERROR: exp: -4.058329 ulp error at 0x1.6005p+2 (0x40b00280): *0x1.e98882p+7 vs. 0x1.e9887ap+7

Our current understanding is that the use of the double input values makes the result of the reference calculation a bit 'overly' precise. There is a number of other reference_relaxed functions in this file where the input is truncated to float

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reference calculation cannot be "overly precise": the specification requires the reference calculation to yield the infinitely precise result. The allowable error is then relative to this infinitely precise result.

Thanks for providing an example value; I'll have to dig deeper into that but it'll be after the festive season.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this reported failure is correct, and the produced results are > 4ULP from the infinitely precise result See: https://godbolt.org/z/1zzh4MsYs

In short:

  • The printed reference value is 0x1.e98882p+7f (bits 0x4374C441), which is actually slightly smaller than the results computed as a double, which is a proxy for the infinitely precise result.
  • The produced value is 0x1.e9887ap+7f (bits 0x4374C43D), which is four away from the reference result, so it's close but because the printed reference value is slightly smaller than the infinitely precise result this means that the produced value is > 4ULP from the infinitely precise result.

Perhaps the question we should be asking ourselves is: Is the required accuracy for the embedded profile correct? Since the full profile required accuracy is a function of the input value and the embedded profile required accuracy is not, I think the embedded profile actually requires more accuracy than the full profile in this specific case, which doesn't quite feel right.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your analysis @bashbaug, I agree with your findings.

Perhaps the question we should be asking ourselves is: Is the required accuracy for the embedded profile correct?

That's a very good point. For this particular input value, the embedded profile requires <=4 ULP, but the full profile tolerates as much as 14 ULP! One would expect the embedded profile to require less accuracy than the full profile, so the specification doesn't feel correct here. It turns out we have a spec issue for this already: KhronosGroup/OpenCL-Docs#288

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants