Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

should min(0.0, NaN) be NaN? #7866

Closed
StefanKarpinski opened this issue Aug 6, 2014 · 51 comments
Closed

should min(0.0, NaN) be NaN? #7866

StefanKarpinski opened this issue Aug 6, 2014 · 51 comments
Labels
needs decision A decision on this change is needed
Milestone

Comments

@StefanKarpinski
Copy link
Member

I know this has come up before and we opted to go with Matlab's behavior, but it strikes me that NaN poisoning really might be safer. After all a NaN could be larger or smaller than any value.

@ViralBShah
Copy link
Member

+1 for poisoning. The same should apply to minimum too then.

@cartazio
Copy link

cartazio commented Aug 6, 2014

One subtlety is which NaN should you pick when Both arguments are NaN? I think in that case one valid choice is to use the IEEE specified total order.

@StefanKarpinski
Copy link
Member Author

x86 floating-point instructions all use the leftmost NaN argument, which seems like a reasonable thing to do. Adding more work to pick an ordering among NaNs seems unnecessary – I can't imagine any situation where which of two NaNs one picks is significant.

@JeffBezanson
Copy link
Member

We went with standards here, e.g. fmin and, I believe, IEEE. This is not just taken from matlab.

@cartazio
Copy link

cartazio commented Aug 6, 2014

IEEE does no specify min or max, just >, ==, <, <=,>= and friends.

 #include <stdlib.h>
#include <stdio.h>
#include <math.h>

void main(){
  double nann = 0.0/0.0 ;
  printf("%f\n%f\n%f",fmax(nann,1.0),fmax(nann,1.0),fmax(1.20,3.0) );
}

prints

~/D/r/intFloatRegisterFoo $ ./a.out
1.000000
1.000000
3.000000⏎

additionally, according to the c language reference:

  Returns the larger of two floating point arguments, treating NaNs as missing data (between a NaN and a    
  numeric value, the numeric value is chosen)

that seems redundant in the context of language support for explicitly coding missing data.

@JeffBezanson
Copy link
Member

Section 5.3 of the IEEE fp standard specifies minNum and maxNum which behave this way.

@cartazio
Copy link

cartazio commented Aug 6, 2014

you are correct sir! (though i must admit it bothers me, but it makes sense in the context of C)

section 5.3

― sourceFormat minNum(source, source)
sourceFormat maxNum(source, source)
sourceFormat minNumMag(source, source)
sourceFormat maxNumMag(source, source)

minNum(x, y) is the canonicalized number x if x<y, y if y<x, the canonicalized number if one operand is a number and the other a quiet NaN. Otherwise it is either x or y, canonicalized (this means results might differ among implementations). When either x or y is a signalingNaN, then the result is according to 6.2.
maxNum(x, y) is the canonicalized number y if x<y, x if y<x, the canonicalized number if one operand is a number and the other a quiet NaN. Otherwise it is either x or y, canonicalized (this means results might differ among implementations). When either x or y is a signalingNaN, then the result is according to 6.2.

section 6.2 Operations with NaNs 6.2.0

Two different kinds of NaN, signaling and quiet, shall be supported in all floating-point operations. Signaling NaNs afford representations for uninitialized variables and arithmetic-like enhancements (such as complex-affine infinities or extremely wide range) that are not in the scope of this standard. Quiet NaNs should, by means left to the implementer’s discretion, afford retrospective diagnostic information inherited from invalid or unavailable data and results. To facilitate propagation of diagnostic information contained in NaNs, as much of that information as possible should be preserved in NaN results of operations.
Under default exception handling, any operation signaling an invalid operation exception and for which a floating-point result is to be delivered shall deliver a quiet NaN.
Signaling NaNs shall be reserved operands that, under default exception handling, signal the invalid operation exception (see 7.2) for every general-computational and signaling-computational operation except for the conversions described in 5.12. For non-default treatment, see 8.

@StefanKarpinski
Copy link
Member Author

Treating NaN as missing data seems incorrect and dangerous, despite the C standard.

@staticfloat
Copy link
Member

I'd like to add a +1 for poisoning. I think if you're wanting to treat something as missing data, you have much better options than using NaN.

@cartazio
Copy link

cartazio commented Aug 7, 2014

i'd favor poisoning too, I'm actually prepping a proposal to change how GHC haskell min and max work on floats to have the poisoning semantics (ie nan propagating).

@JeffBezanson
Copy link
Member

I agree that not propagating NaNs here doesn't make sense. There's nothing special about min that grants it a definite value on undefined input. It's just that following IEEE is a good default decision.

One can have a vigorous debate about 1^NaN, but for min I'm not sure what the argument would be. Although this does raise the question of min(-Inf,NaN) :)

@StefanKarpinski
Copy link
Member Author

I would just keep it simple and say that if any argument is NaN, the result is NaN.

@ArchRobison
Copy link
Contributor

How about providing minnum and maxnum for programmers who want the IEEE semantics (or want the associated hardware efficiencies), and do NAN poisoning for min and max? Numpy calls the IEEE version nanmin.

I became curious about the rationale for the IEEE rules. William Kahan ("father of IEEE 754") seems to be the origin of the rules according to committee minutes:

Kahan proposed that it must be commutative & the default shall be to pass numbers over NaNs & he doesn't care about the sign of zero.

Kahan is an expert and so presumably had some motivation (possibly for experts), but I haven't been able to find it. I've sent him an email query.

@simonster
Copy link
Member

It seems that the minsd instruction is ifelse(x < y, x, y), which is neither the NaN poisioning nor IEEE definition.

@ArchRobison
Copy link
Contributor

I erred when I wrote "associated hardware efficiencies". I had misinterpreted the description of minsd. The semantics of minsd were designed back in the '90s, well before IEEE 754-2008, to that compilers could optimize the common C idiom ```x<y?x:y".

As for the semantics that we seem to favor, the Intel manual coyly states:

If only one value is a NaN (SNaN or QNaN) for this instruction, the second operand (source operand), either a NaN or a valid floating-point value, is written to the result. If instead of this behavior, it is required that the NaN source operand (from either the first or second operand) be returned, the action of MINPD can be emulated using a sequence of instructions, such as, a comparison followed by AND, ANDN and OR.

I'm puzzling over the emulation that the author had in mind. I don't see how a single comparison suffices, since if one of the operands is a NaN, the information from the comparison does not distinguish which operand is the NaN.

@ArchRobison
Copy link
Contributor

FYI, I inquired with the experts here, and the recommended AVX sequence for NaN-propagating min is:

VMIN R, a, b           // result is b if a or b are NaN, min(a,b) otherwise
                       // so Nan is not propagated only if a is the NaN
VCMPNEQ M, a, a        // M=11…11  if a is NaN, 0 otherwise
VBLENDV Res, R, a, M   // Res = R if M=0 (a not NaN), otherwise Res=a (if a is NaN)

It's only one instruction shorter than the code currently generated from the NaN-propagating definition min{T<:FloatingPoint}(x::T, y::T) = ifelse((x < y) | (x != x), x, y). If we lose sleep over that extra instruction, we can submit a patch to LLVM. :-)

@cartazio
Copy link

does that code sequence properly handle signedness of zero? Afaik, the VMIN operations don't distinguish -0 vs 0

@ArchRobison
Copy link
Contributor

The sequence will compute min(-0,+0) as +0. Detecting signedness of zero is problematic for any comparison-based min routine, since IEEE 754-2008 says Comparisons shall ignore the sign of zero (so +0 = −0).

@porterjamesj
Copy link
Contributor

As someone who has only an introductory systems course level understanding of IEEE 754, the one thing that sticks out in my mind about NaNs is that they tend to poison computations, so this change might be intuitive for the average user regardless of what the standard actually says :)

I like the idea of providing separate functions for those who do want the standard-specified behavior if we decide to do this.

@cartazio
Copy link

one small detail that I think might play nice with future nan interactions, would be that, in the case that both arguments are nans, perhaps the result nan should be the bitwise OR of the two input nans (idea being that those 53 bits otherwise unused in NANs could represent different causes of the original nan error, and that would give a neat trick for saying "heres the set of ways your program borked the math!")

@cartazio
Copy link

this LLVM thread is relevant http://article.gmane.org/gmane.comp.compilers.llvm.cvs/201804
i'm going to ask them about the nan returning semantics option

@ArchRobison
Copy link
Contributor

William Kahan kindly replied to my enquiry about the IEEE definition of min. His note suggests a 4th possible definition of min that propagates NaN except when one of the arguments is -∞, in order to preserve the identity min(-∞,y)==-∞.

Here is his reply in full:

Arch:
The recommendation that  min{ x, NaN}  and  min{NaN, x}  both be  x  instead of  NaN 
arose at a time when graph plotting programs had nowhere to put a  NaN,  and could
abort instead.  In my experience the most common occasions when  min{x, NaN}  arose 
were in plotting graphs of functions whose domains had holes.  For instance,  let  Y(x) 
be the solutions of the equation   x^2 - y^2 = 1 ,  namely  Y(x) = */- sqrt( (x-1)(x+1) ) ;
and plot  Y(x)  vs.  x  over,  say,  -2 < x < 2 .  Then the result should be two branches of 
an hyperbola separated by a gap where  -1 < x < 1 .  To bridge the gap we perform a 
"Windowing"  operation by plotting  max{-2,  min{+2, Y(x)} }  in a square window  that
barely contains  -2 < x < 2   and  -2 < y < 2 ,  putting the  NaNs  on the window's upper
edge where they can be ignored.  This resolution of the problems posed by domains 
with holes is not perfect,  but better than the alternatives available at the time to cope 
with holes whose locations could.not easily be predicted before the attempt to plot.

There must be occasions where  minm(x, NaN}  and  maxm{x, NaN}  should be  NaN 
instead of  x  to serve a computational necessity.  Here I have used different names for 
the minimum and maximum functions;  you might disagree with my choice of names.

However,  maxm(+Infinity, NaN)  must be  +Infinity,  NOT NaN,  to honor the rule that 
if a function  f(x, y)  has the property for some value  X  that  f(X, y)  is independent of 
y ,  be it finite or infinite,  then that  f(X, NaN)  must be the same as  f(X, y) .  This may 
make  maxm  harder to implement than  max . The foregoing rule is crucially necessary. 
If there were no way to get rid of an irrelevant  NaN  then it might as well stop computation 
at its earliest encounter.

I hope that my explanation helps.  If you have a better rationale I will be glad to entertain
it.

With best wishes,
                                                                    Prof. W. Kahan

@cartazio
Copy link

huh, this is a very interesting and good point about the "Laws" of min and max on the reals

@JeffBezanson
Copy link
Member

That is a great email, as expected. I think propagating NaN except for +-Inf is a very good definition for min and max.

Honestly I find the plotting justification shockingly weak. There's no reason to assume plotting must be done by applying min and max to points, and nothing further. It would be far better to check for NaN or out-of-range values and simply omit them. If there is no better justification, I would hope this is changed in the next IEEE standard revision, if there is one. They could make it backwards compatible by adding new operations with this behavior.

@StefanKarpinski
Copy link
Member Author

We should get "I ❤️ Kahan" t-shirts made.

@sunfishcode
Copy link

"if a function f(x, y) has the property for some value X that f(X, y) is independent of y, be it finite or infinite, then that f(X, NaN) must be the same as f(X, y) ."

Does anyone know the justification for this rule? I'm not necessarily disputing it, I primarily want to understand it.

I have also asked Professor Kahan about this, and one thing he said was that this rule assumes that the environment has floating-point flags which record floating-point exceptions that have occurred, and which can be easily queried. We don't necessarily need the return value of an expression to record that a NaN happened if there's a flag which holds that information that we can test.

However, Julia, and every other programming language I'm aware of besides assembly, doesn't provide easy and reliable access to these flags (including C, even with fenv.h, because in practice compilers like clang and GCC don't implement #pragma STDC FENV_ACCESS). It's not a coincidence that modelling floating-point operations as operations that mutate global state is not something that easily fits into high-level languages.

Given that Julia doesn't expose the floating-point exception flags, it's not clear that this justification applies here.

@eschnett
Copy link
Contributor

This rule does not apply e.g. for adding 0.0. 0.0+y == y for all values of y, except if y is a nan. (I hope I'm not stumbling about a special case for -0.0 here.)

This actually inhibits compiler optimizations. With full IEEE compatibility, 0.0+y cannot be optimized to y.

@ArchRobison
Copy link
Contributor

My impression is that the rule is derived from the assumption that NaN represents "could be any real number". But I'm not so sure about the rule if the NaN came from sqrt(-1.0).

0.0+y is "the same as" y when y is NaN. It's just that == with a NaN is not the same as "same as". It's 0.0+(-0.0) that's the thorn in the side for optimizing 0.0+y, since 0.0+(-0.0) is 0.0. According to Muchnick's text the only safe optimization of IEEE arithmetic is replacing x/c by x*(1/c) when c and 1/c can be represented exactly (which means c has to be an integral power of 2).

The mutable global state of IEEE arithmetic was a big mistake in my opinion. It plays badly in highly pipelined processors and parallel programming environments. I once heard a talk by Guy Steele where he suggested it was time to revise IEEE 754 to remove the global state. I recall that he had to shorten the significand by one bit so that he could use the bit for other purposes, thus his proposal was not a backwards-compatible format.

@danluu
Copy link
Contributor

danluu commented Sep 19, 2014

I don't really follow the argument for making max(Infinity, NaN) = Infinity. This debate is very similar to the X-optimism vs. X-pessimism debate in hardware simulators, but there's a distinction, which is that X in a hardware simulator represents some unknown (but representable) value. As Arch points out, NaN can represent all sorts of crazy stuff. Why should max(Infinity, some nonsensical thing) be Infinity?

@StefanKarpinski
Copy link
Member Author

I say we go ahead with this in 0.6.

@StefanKarpinski StefanKarpinski added this to the 0.6.0 milestone Sep 13, 2016
@simonbyrne
Copy link
Contributor

We should probably have a function that gives the old behaviour for standards-junkies. numpy and matlab use nanmin.

@StefanKarpinski
Copy link
Member Author

StefanKarpinski commented Sep 14, 2016

Eh, let's wait and see how much complaining there is first. Can also go in a package.

@StefanKarpinski
Copy link
Member Author

@AgnerF
Copy link

AgnerF commented Mar 1, 2018

FYI, this question is also discussed here: []https://stackoverflow.com/questions/49011370/nan-propagation-and-ieee-754-standard/49040225
I agree that a global state does not work well with parallel processing. Now that more and more platforms support vector processing, it is better to rely on NAN propagation than fault trapping because you can have multiple faults simultaneously in one vector. In my opinion, min and max should propagate NAN always.

@StefanKarpinski
Copy link
Member Author

That's good to hear 👍 – that is what Julia does these days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs decision A decision on this change is needed
Projects
None yet
Development

No branches or pull requests