-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
change reference_relaxed_exp to use float type to calculate reference #2160
base: main
Are you sure you want to change the base?
Conversation
3c470fd
to
71e418c
Compare
double reference_relaxed_exp(double x) { return reference_exp(x); } | ||
double reference_relaxed_exp(double x) | ||
{ | ||
return reference_exp2(((float)x) * HEX_FLT(+, 1, 715476, +, 0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, could you please explain why this test should use a float to compute the reference value in this case?
In case it helps other reviewers, HEX_FLT(+, 1, 715476, +, 0)
expands to + 0x1.715476p+0
, which is equal to 1.44269f
, or M_LOG2E_F
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the ulp error for exp according to the specs is ≤ 3 ulp for the full profile and ≤ 4 ulp for embedded profile. However, with -cl-unsafe-math-optimizations flag turned on the ulp requirements become ≤ 3 + floor(fabs(2 * x)) ulp for the full profile, and ≤ 4 ulp for the embedded profile.
As the ulp error is a bit relaxed with the unsafe math operations is there a reason for the same reference values being used for both the relaxed and normal implementation of the exp built-in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Altering the reference function doesn't feel like the right solution. unary_float.cpp
already has logic to select the ULP threshold for exp
and exp2
in relaxed mode:
if (strcmp(fname, "exp") == 0 || strcmp(fname, "exp2") == 0)
{
// For full profile, ULP depends on input value.
// For embedded profile, ULP comes from functionList.
if (!gIsEmbedded)
{
ulps = 3.0f + floor(fabs(2 * s[j]));
}
fail = !(fabsf(err) <= ulps);
}
Are you able to share any details on the values that are failing for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few failures but one example is the following
ERROR: exp: -4.058329 ulp error at 0x1.6005p+2 (0x40b00280): *0x1.e98882p+7 vs. 0x1.e9887ap+7
Our current understanding is that the use of the double input values makes the result of the reference calculation a bit 'overly' precise. There is a number of other reference_relaxed
functions in this file where the input is truncated to float
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reference calculation cannot be "overly precise": the specification requires the reference calculation to yield the infinitely precise result. The allowable error is then relative to this infinitely precise result.
Thanks for providing an example value; I'll have to dig deeper into that but it'll be after the festive season.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this reported failure is correct, and the produced results are > 4ULP from the infinitely precise result See: https://godbolt.org/z/1zzh4MsYs
In short:
- The printed reference value is 0x1.e98882p+7f (bits 0x4374C441), which is actually slightly smaller than the results computed as a double, which is a proxy for the infinitely precise result.
- The produced value is 0x1.e9887ap+7f (bits 0x4374C43D), which is four away from the reference result, so it's close but because the printed reference value is slightly smaller than the infinitely precise result this means that the produced value is > 4ULP from the infinitely precise result.
Perhaps the question we should be asking ourselves is: Is the required accuracy for the embedded profile correct? Since the full profile required accuracy is a function of the input value and the embedded profile required accuracy is not, I think the embedded profile actually requires more accuracy than the full profile in this specific case, which doesn't quite feel right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your analysis @bashbaug, I agree with your findings.
Perhaps the question we should be asking ourselves is: Is the required accuracy for the embedded profile correct?
That's a very good point. For this particular input value, the embedded profile requires <=4 ULP, but the full profile tolerates as much as 14 ULP! One would expect the embedded profile to require less accuracy than the full profile, so the specification doesn't feel correct here. It turns out we have a spec issue for this already: KhronosGroup/OpenCL-Docs#288
use lower precision for reference values used for the relaxed implementation of exp math built-in.