-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
should min(0.0, NaN) be NaN? #7866
Comments
+1 for poisoning. The same should apply to minimum too then. |
One subtlety is which NaN should you pick when Both arguments are NaN? I think in that case one valid choice is to use the IEEE specified total order. |
x86 floating-point instructions all use the leftmost NaN argument, which seems like a reasonable thing to do. Adding more work to pick an ordering among NaNs seems unnecessary – I can't imagine any situation where which of two NaNs one picks is significant. |
We went with standards here, e.g. |
IEEE does no specify min or max, just >, ==, <, <=,>= and friends. #include <stdlib.h>
#include <stdio.h>
#include <math.h>
void main(){
double nann = 0.0/0.0 ;
printf("%f\n%f\n%f",fmax(nann,1.0),fmax(nann,1.0),fmax(1.20,3.0) );
} prints
additionally, according to the c language reference:
that seems redundant in the context of language support for explicitly coding missing data. |
Section 5.3 of the IEEE fp standard specifies |
you are correct sir! (though i must admit it bothers me, but it makes sense in the context of C) section 5.3― sourceFormat minNum(source, source) minNum(x, y) is the canonicalized number x if x<y, y if y<x, the canonicalized number if one operand is a number and the other a quiet NaN. Otherwise it is either x or y, canonicalized (this means results might differ among implementations). When either x or y is a signalingNaN, then the result is according to 6.2. section 6.2 Operations with NaNs 6.2.0Two different kinds of NaN, signaling and quiet, shall be supported in all floating-point operations. Signaling NaNs afford representations for uninitialized variables and arithmetic-like enhancements (such as complex-affine infinities or extremely wide range) that are not in the scope of this standard. Quiet NaNs should, by means left to the implementer’s discretion, afford retrospective diagnostic information inherited from invalid or unavailable data and results. To facilitate propagation of diagnostic information contained in NaNs, as much of that information as possible should be preserved in NaN results of operations. |
Treating NaN as missing data seems incorrect and dangerous, despite the C standard. |
I'd like to add a +1 for poisoning. I think if you're wanting to treat something as missing data, you have much better options than using NaN. |
i'd favor poisoning too, I'm actually prepping a proposal to change how GHC haskell min and max work on floats to have the poisoning semantics (ie nan propagating). |
I agree that not propagating NaNs here doesn't make sense. There's nothing special about One can have a vigorous debate about |
I would just keep it simple and say that if any argument is NaN, the result is NaN. |
How about providing I became curious about the rationale for the IEEE rules. William Kahan ("father of IEEE 754") seems to be the origin of the rules according to committee minutes:
Kahan is an expert and so presumably had some motivation (possibly for experts), but I haven't been able to find it. I've sent him an email query. |
It seems that the |
I erred when I wrote "associated hardware efficiencies". I had misinterpreted the description of As for the semantics that we seem to favor, the Intel manual coyly states:
I'm puzzling over the emulation that the author had in mind. I don't see how a single comparison suffices, since if one of the operands is a NaN, the information from the comparison does not distinguish which operand is the NaN. |
FYI, I inquired with the experts here, and the recommended AVX sequence for NaN-propagating min is:
It's only one instruction shorter than the code currently generated from the NaN-propagating definition |
does that code sequence properly handle signedness of zero? Afaik, the VMIN operations don't distinguish -0 vs 0 |
The sequence will compute min(-0,+0) as +0. Detecting signedness of zero is problematic for any comparison-based min routine, since IEEE 754-2008 says |
As someone who has only an introductory systems course level understanding of IEEE 754, the one thing that sticks out in my mind about NaNs is that they tend to poison computations, so this change might be intuitive for the average user regardless of what the standard actually says :) I like the idea of providing separate functions for those who do want the standard-specified behavior if we decide to do this. |
one small detail that I think might play nice with future nan interactions, would be that, in the case that both arguments are nans, perhaps the result nan should be the bitwise OR of the two input nans (idea being that those 53 bits otherwise unused in NANs could represent different causes of the original nan error, and that would give a neat trick for saying "heres the set of ways your program borked the math!") |
this LLVM thread is relevant http://article.gmane.org/gmane.comp.compilers.llvm.cvs/201804 |
William Kahan kindly replied to my enquiry about the IEEE definition of min. His note suggests a 4th possible definition of min that propagates NaN except when one of the arguments is -∞, in order to preserve the identity min(-∞,y)==-∞. Here is his reply in full:
|
huh, this is a very interesting and good point about the "Laws" of min and max on the reals |
That is a great email, as expected. I think propagating NaN except for +-Inf is a very good definition for min and max. Honestly I find the plotting justification shockingly weak. There's no reason to assume plotting must be done by applying min and max to points, and nothing further. It would be far better to check for NaN or out-of-range values and simply omit them. If there is no better justification, I would hope this is changed in the next IEEE standard revision, if there is one. They could make it backwards compatible by adding new operations with this behavior. |
We should get "I ❤️ Kahan" t-shirts made. |
"if a function f(x, y) has the property for some value X that f(X, y) is independent of y, be it finite or infinite, then that f(X, NaN) must be the same as f(X, y) ." Does anyone know the justification for this rule? I'm not necessarily disputing it, I primarily want to understand it. I have also asked Professor Kahan about this, and one thing he said was that this rule assumes that the environment has floating-point flags which record floating-point exceptions that have occurred, and which can be easily queried. We don't necessarily need the return value of an expression to record that a NaN happened if there's a flag which holds that information that we can test. However, Julia, and every other programming language I'm aware of besides assembly, doesn't provide easy and reliable access to these flags (including C, even with fenv.h, because in practice compilers like clang and GCC don't implement #pragma STDC FENV_ACCESS). It's not a coincidence that modelling floating-point operations as operations that mutate global state is not something that easily fits into high-level languages. Given that Julia doesn't expose the floating-point exception flags, it's not clear that this justification applies here. |
This rule does not apply e.g. for adding 0.0. 0.0+y == y for all values of y, except if y is a nan. (I hope I'm not stumbling about a special case for -0.0 here.) This actually inhibits compiler optimizations. With full IEEE compatibility, 0.0+y cannot be optimized to y. |
My impression is that the rule is derived from the assumption that NaN represents "could be any real number". But I'm not so sure about the rule if the NaN came from sqrt(-1.0). 0.0+y is "the same as" y when y is NaN. It's just that == with a NaN is not the same as "same as". It's 0.0+(-0.0) that's the thorn in the side for optimizing 0.0+y, since 0.0+(-0.0) is 0.0. According to Muchnick's text the only safe optimization of IEEE arithmetic is replacing x/c by x*(1/c) when c and 1/c can be represented exactly (which means c has to be an integral power of 2). The mutable global state of IEEE arithmetic was a big mistake in my opinion. It plays badly in highly pipelined processors and parallel programming environments. I once heard a talk by Guy Steele where he suggested it was time to revise IEEE 754 to remove the global state. I recall that he had to shorten the significand by one bit so that he could use the bit for other purposes, thus his proposal was not a backwards-compatible format. |
I don't really follow the argument for making |
I say we go ahead with this in 0.6. |
We should probably have a function that gives the old behaviour for standards-junkies. numpy and matlab use |
Eh, let's wait and see how much complaining there is first. Can also go in a package. |
FYI, this question is also discussed here: []https://stackoverflow.com/questions/49011370/nan-propagation-and-ieee-754-standard/49040225 |
That's good to hear 👍 – that is what Julia does these days. |
I know this has come up before and we opted to go with Matlab's behavior, but it strikes me that NaN poisoning really might be safer. After all a NaN could be larger or smaller than any value.
The text was updated successfully, but these errors were encountered: