should min(0.0, NaN) be NaN? #7866

StefanKarpinski · 2014-08-06T20:45:55Z

I know this has come up before and we opted to go with Matlab's behavior, but it strikes me that NaN poisoning really might be safer. After all a NaN could be larger or smaller than any value.

ViralBShah · 2014-08-06T20:51:25Z

+1 for poisoning. The same should apply to minimum too then.

cartazio · 2014-08-06T21:08:20Z

One subtlety is which NaN should you pick when Both arguments are NaN? I think in that case one valid choice is to use the IEEE specified total order.

StefanKarpinski · 2014-08-06T21:16:20Z

x86 floating-point instructions all use the leftmost NaN argument, which seems like a reasonable thing to do. Adding more work to pick an ordering among NaNs seems unnecessary – I can't imagine any situation where which of two NaNs one picks is significant.

JeffBezanson · 2014-08-06T21:33:40Z

We went with standards here, e.g. fmin and, I believe, IEEE. This is not just taken from matlab.

cartazio · 2014-08-06T22:45:49Z

IEEE does no specify min or max, just >, ==, <, <=,>= and friends.

 #include <stdlib.h>
#include <stdio.h>
#include <math.h>

void main(){
  double nann = 0.0/0.0 ;
  printf("%f\n%f\n%f",fmax(nann,1.0),fmax(nann,1.0),fmax(1.20,3.0) );
}

prints

~/D/r/intFloatRegisterFoo $ ./a.out
1.000000
1.000000
3.000000⏎

additionally, according to the c language reference:

  Returns the larger of two floating point arguments, treating NaNs as missing data (between a NaN and a    
  numeric value, the numeric value is chosen)

that seems redundant in the context of language support for explicitly coding missing data.

JeffBezanson · 2014-08-06T23:04:18Z

Section 5.3 of the IEEE fp standard specifies minNum and maxNum which behave this way.

cartazio · 2014-08-06T23:39:34Z

you are correct sir! (though i must admit it bothers me, but it makes sense in the context of C)

section 5.3

― sourceFormat minNum(source, source)
sourceFormat maxNum(source, source)
sourceFormat minNumMag(source, source)
sourceFormat maxNumMag(source, source)

minNum(x, y) is the canonicalized number x if x<y, y if y<x, the canonicalized number if one operand is a number and the other a quiet NaN. Otherwise it is either x or y, canonicalized (this means results might differ among implementations). When either x or y is a signalingNaN, then the result is according to 6.2.
maxNum(x, y) is the canonicalized number y if x<y, x if y<x, the canonicalized number if one operand is a number and the other a quiet NaN. Otherwise it is either x or y, canonicalized (this means results might differ among implementations). When either x or y is a signalingNaN, then the result is according to 6.2.

section 6.2 Operations with NaNs 6.2.0

Two different kinds of NaN, signaling and quiet, shall be supported in all floating-point operations. Signaling NaNs afford representations for uninitialized variables and arithmetic-like enhancements (such as complex-affine infinities or extremely wide range) that are not in the scope of this standard. Quiet NaNs should, by means left to the implementer’s discretion, afford retrospective diagnostic information inherited from invalid or unavailable data and results. To facilitate propagation of diagnostic information contained in NaNs, as much of that information as possible should be preserved in NaN results of operations.
Under default exception handling, any operation signaling an invalid operation exception and for which a floating-point result is to be delivered shall deliver a quiet NaN.
Signaling NaNs shall be reserved operands that, under default exception handling, signal the invalid operation exception (see 7.2) for every general-computational and signaling-computational operation except for the conversions described in 5.12. For non-default treatment, see 8.

StefanKarpinski · 2014-08-07T01:11:56Z

Treating NaN as missing data seems incorrect and dangerous, despite the C standard.

staticfloat · 2014-08-07T01:23:07Z

I'd like to add a +1 for poisoning. I think if you're wanting to treat something as missing data, you have much better options than using NaN.

cartazio · 2014-08-07T01:26:35Z

i'd favor poisoning too, I'm actually prepping a proposal to change how GHC haskell min and max work on floats to have the poisoning semantics (ie nan propagating).

JeffBezanson · 2014-08-07T01:57:25Z

I agree that not propagating NaNs here doesn't make sense. There's nothing special about min that grants it a definite value on undefined input. It's just that following IEEE is a good default decision.

One can have a vigorous debate about 1^NaN, but for min I'm not sure what the argument would be. Although this does raise the question of min(-Inf,NaN) :)

StefanKarpinski · 2014-08-07T02:28:47Z

I would just keep it simple and say that if any argument is NaN, the result is NaN.

ArchRobison · 2014-08-08T18:19:32Z

How about providing minnum and maxnum for programmers who want the IEEE semantics (or want the associated hardware efficiencies), and do NAN poisoning for min and max? Numpy calls the IEEE version nanmin.

I became curious about the rationale for the IEEE rules. William Kahan ("father of IEEE 754") seems to be the origin of the rules according to committee minutes:

Kahan proposed that it must be commutative & the default shall be to pass numbers over NaNs & he doesn't care about the sign of zero.

Kahan is an expert and so presumably had some motivation (possibly for experts), but I haven't been able to find it. I've sent him an email query.

simonster · 2014-08-09T04:17:20Z

It seems that the minsd instruction is ifelse(x < y, x, y), which is neither the NaN poisioning nor IEEE definition.

ArchRobison · 2014-08-11T14:51:51Z

I erred when I wrote "associated hardware efficiencies". I had misinterpreted the description of minsd. The semantics of minsd were designed back in the '90s, well before IEEE 754-2008, to that compilers could optimize the common C idiom ```x<y?x:y".

As for the semantics that we seem to favor, the Intel manual coyly states:

If only one value is a NaN (SNaN or QNaN) for this instruction, the second operand (source operand), either a NaN or a valid floating-point value, is written to the result. If instead of this behavior, it is required that the NaN source operand (from either the first or second operand) be returned, the action of MINPD can be emulated using a sequence of instructions, such as, a comparison followed by AND, ANDN and OR.

I'm puzzling over the emulation that the author had in mind. I don't see how a single comparison suffices, since if one of the operands is a NaN, the information from the comparison does not distinguish which operand is the NaN.

ArchRobison · 2014-08-11T21:48:32Z

FYI, I inquired with the experts here, and the recommended AVX sequence for NaN-propagating min is:

VMIN R, a, b           // result is b if a or b are NaN, min(a,b) otherwise
                       // so Nan is not propagated only if a is the NaN
VCMPNEQ M, a, a        // M=11…11  if a is NaN, 0 otherwise
VBLENDV Res, R, a, M   // Res = R if M=0 (a not NaN), otherwise Res=a (if a is NaN)

It's only one instruction shorter than the code currently generated from the NaN-propagating definition min{T<:FloatingPoint}(x::T, y::T) = ifelse((x < y) | (x != x), x, y). If we lose sleep over that extra instruction, we can submit a patch to LLVM. :-)

cartazio · 2014-08-11T21:54:12Z

does that code sequence properly handle signedness of zero? Afaik, the VMIN operations don't distinguish -0 vs 0

ArchRobison · 2014-08-11T22:32:44Z

The sequence will compute min(-0,+0) as +0. Detecting signedness of zero is problematic for any comparison-based min routine, since IEEE 754-2008 says Comparisons shall ignore the sign of zero (so +0 = −0).

porterjamesj · 2014-08-13T03:36:54Z

As someone who has only an introductory systems course level understanding of IEEE 754, the one thing that sticks out in my mind about NaNs is that they tend to poison computations, so this change might be intuitive for the average user regardless of what the standard actually says :)

I like the idea of providing separate functions for those who do want the standard-specified behavior if we decide to do this.

cartazio · 2014-08-13T04:43:28Z

one small detail that I think might play nice with future nan interactions, would be that, in the case that both arguments are nans, perhaps the result nan should be the bitwise OR of the two input nans (idea being that those 53 bits otherwise unused in NANs could represent different causes of the original nan error, and that would give a neat trick for saying "heres the set of ways your program borked the math!")

cartazio · 2014-08-18T18:56:57Z

this LLVM thread is relevant http://article.gmane.org/gmane.comp.compilers.llvm.cvs/201804
i'm going to ask them about the nan returning semantics option

ArchRobison · 2014-08-22T16:01:34Z

William Kahan kindly replied to my enquiry about the IEEE definition of min. His note suggests a 4th possible definition of min that propagates NaN except when one of the arguments is -∞, in order to preserve the identity min(-∞,y)==-∞.

Here is his reply in full:

Arch:
The recommendation that  min{ x, NaN}  and  min{NaN, x}  both be  x  instead of  NaN 
arose at a time when graph plotting programs had nowhere to put a  NaN,  and could
abort instead.  In my experience the most common occasions when  min{x, NaN}  arose 
were in plotting graphs of functions whose domains had holes.  For instance,  let  Y(x) 
be the solutions of the equation   x^2 - y^2 = 1 ,  namely  Y(x) = */- sqrt( (x-1)(x+1) ) ;
and plot  Y(x)  vs.  x  over,  say,  -2 < x < 2 .  Then the result should be two branches of 
an hyperbola separated by a gap where  -1 < x < 1 .  To bridge the gap we perform a 
"Windowing"  operation by plotting  max{-2,  min{+2, Y(x)} }  in a square window  that
barely contains  -2 < x < 2   and  -2 < y < 2 ,  putting the  NaNs  on the window's upper
edge where they can be ignored.  This resolution of the problems posed by domains 
with holes is not perfect,  but better than the alternatives available at the time to cope 
with holes whose locations could.not easily be predicted before the attempt to plot.

There must be occasions where  minm(x, NaN}  and  maxm{x, NaN}  should be  NaN 
instead of  x  to serve a computational necessity.  Here I have used different names for 
the minimum and maximum functions;  you might disagree with my choice of names.

However,  maxm(+Infinity, NaN)  must be  +Infinity,  NOT NaN,  to honor the rule that 
if a function  f(x, y)  has the property for some value  X  that  f(X, y)  is independent of 
y ,  be it finite or infinite,  then that  f(X, NaN)  must be the same as  f(X, y) .  This may 
make  maxm  harder to implement than  max . The foregoing rule is crucially necessary. 
If there were no way to get rid of an irrelevant  NaN  then it might as well stop computation 
at its earliest encounter.

I hope that my explanation helps.  If you have a better rationale I will be glad to entertain
it.

With best wishes,
                                                                    Prof. W. Kahan

cartazio · 2014-08-22T16:05:26Z

huh, this is a very interesting and good point about the "Laws" of min and max on the reals

JeffBezanson · 2014-08-22T16:19:46Z

That is a great email, as expected. I think propagating NaN except for +-Inf is a very good definition for min and max.

Honestly I find the plotting justification shockingly weak. There's no reason to assume plotting must be done by applying min and max to points, and nothing further. It would be far better to check for NaN or out-of-range values and simply omit them. If there is no better justification, I would hope this is changed in the next IEEE standard revision, if there is one. They could make it backwards compatible by adding new operations with this behavior.

StefanKarpinski · 2014-08-22T19:23:53Z

We should get "I ❤️ Kahan" t-shirts made.

sunfishcode · 2014-09-18T22:14:30Z

"if a function f(x, y) has the property for some value X that f(X, y) is independent of y, be it finite or infinite, then that f(X, NaN) must be the same as f(X, y) ."

Does anyone know the justification for this rule? I'm not necessarily disputing it, I primarily want to understand it.

I have also asked Professor Kahan about this, and one thing he said was that this rule assumes that the environment has floating-point flags which record floating-point exceptions that have occurred, and which can be easily queried. We don't necessarily need the return value of an expression to record that a NaN happened if there's a flag which holds that information that we can test.

However, Julia, and every other programming language I'm aware of besides assembly, doesn't provide easy and reliable access to these flags (including C, even with fenv.h, because in practice compilers like clang and GCC don't implement #pragma STDC FENV_ACCESS). It's not a coincidence that modelling floating-point operations as operations that mutate global state is not something that easily fits into high-level languages.

Given that Julia doesn't expose the floating-point exception flags, it's not clear that this justification applies here.

eschnett · 2014-09-19T00:39:09Z

This rule does not apply e.g. for adding 0.0. 0.0+y == y for all values of y, except if y is a nan. (I hope I'm not stumbling about a special case for -0.0 here.)

This actually inhibits compiler optimizations. With full IEEE compatibility, 0.0+y cannot be optimized to y.

ArchRobison · 2014-09-19T02:06:54Z

My impression is that the rule is derived from the assumption that NaN represents "could be any real number". But I'm not so sure about the rule if the NaN came from sqrt(-1.0).

0.0+y is "the same as" y when y is NaN. It's just that == with a NaN is not the same as "same as". It's 0.0+(-0.0) that's the thorn in the side for optimizing 0.0+y, since 0.0+(-0.0) is 0.0. According to Muchnick's text the only safe optimization of IEEE arithmetic is replacing x/c by x*(1/c) when c and 1/c can be represented exactly (which means c has to be an integral power of 2).

The mutable global state of IEEE arithmetic was a big mistake in my opinion. It plays badly in highly pipelined processors and parallel programming environments. I once heard a talk by Guy Steele where he suggested it was time to revise IEEE 754 to remove the global state. I recall that he had to shorten the significand by one bit so that he could use the bit for other purposes, thus his proposal was not a backwards-compatible format.

danluu · 2014-09-19T23:04:32Z

I don't really follow the argument for making max(Infinity, NaN) = Infinity. This debate is very similar to the X-optimism vs. X-pessimism debate in hardware simulators, but there's a distinction, which is that X in a hardware simulator represents some unknown (but representable) value. As Arch points out, NaN can represent all sorts of crazy stuff. Why should max(Infinity, some nonsensical thing) be Infinity?

…g#12552)

StefanKarpinski · 2016-09-13T22:49:31Z

I say we go ahead with this in 0.6.

…g#12552)

simonbyrne · 2016-09-14T09:40:47Z

We should probably have a function that gives the old behaviour for standards-junkies. numpy and matlab use nanmin.

StefanKarpinski · 2016-09-14T16:12:48Z

Eh, let's wait and see how much complaining there is first. Can also go in a package.

StefanKarpinski · 2016-09-14T16:13:35Z

E.g. https://github.com/mlubin/NaNMath.jl

AgnerF · 2018-03-01T06:58:50Z

FYI, this question is also discussed here: []https://stackoverflow.com/questions/49011370/nan-propagation-and-ieee-754-standard/49040225
I agree that a global state does not work well with parallel processing. Now that more and more platforms support vector processing, it is better to rely on NAN propagation than fault trapping because you can have multiple faults simultaneously in one vector. In my opinion, min and max should propagate NAN always.

StefanKarpinski · 2018-03-01T13:56:37Z

That's good to hear 👍 – that is what Julia does these days.

StefanKarpinski added the decision label Aug 6, 2014

stevengj added a commit to stevengj/julia that referenced this issue Aug 11, 2015

max and min of NaN return NaN (closes JuliaLang#7866) (fixes JuliaLan…

1cffc82

…g#12552)

This was referenced Aug 11, 2015

max and min of NaN return NaN #12563

Merged

fix #12552 (norm should return NaN for NaN inputs) #12564

Merged

stevengj added a commit to stevengj/julia that referenced this issue Aug 11, 2015

max and min of NaN return NaN (closes JuliaLang#7866) (fixes JuliaLan…

50515b4

…g#12552)

stevengj added a commit to stevengj/julia that referenced this issue Aug 11, 2015

max and min of NaN return NaN (closes JuliaLang#7866) (fixes JuliaLan…

0129abb

…g#12552)

stevengj added a commit to stevengj/julia that referenced this issue Aug 14, 2015

max and min of NaN return NaN (closes JuliaLang#7866) (fixes JuliaLan…

d6cd43a

…g#12552)

dnadlinger mentioned this issue May 15, 2016

Fix Issue 10448: min and max are not NaN aware dlang/phobos#4316

Closed

wilzbach mentioned this issue May 23, 2016

Make std.math.fmax() and .fmin() variadic. Add explicit NaN handling. dlang/phobos#4346

Closed

simonbyrne mentioned this issue May 26, 2016

should in use isequal to test for containment? #9381

Closed

simonbyrne mentioned this issue Aug 20, 2016

What we need JuliaMath/Libm.jl#1

Closed

54 tasks

StefanKarpinski added this to the 0.6.0 milestone Sep 13, 2016

stevengj added a commit to stevengj/julia that referenced this issue Sep 14, 2016

max and min of NaN return NaN (closes JuliaLang#7866) (fixes JuliaLan…

afe7b8e

…g#12552)

StefanKarpinski closed this as completed in b1ecd56 Sep 16, 2016

simonbyrne mentioned this issue Jan 26, 2017

Regression in min(-Inf, NaN) from 0.5 to 0.6 #20245

Closed

This was referenced Aug 7, 2017

min, max, minmax use isless for comparison (fix #23094) #23155

Merged

make findmin/findmax behavior match min/max #23209

Closed

girving mentioned this issue Aug 17, 2018

tf.reduce_min([inf, nan]).eval() == inf, should be nan tensorflow/tensorflow#17056

Closed

anirudhacharya mentioned this issue Feb 28, 2019

Fix NaN value comparisons in relu, max and min ops apache/mxnet#14262

Merged

6 tasks

vicuna mentioned this issue Mar 14, 2019

OCaml lacks a proper implementation of min and max for floats ocaml/ocaml#7892

Closed

npadmana mentioned this issue Jul 27, 2019

nans do not propagate through a reduction chapel-lang/chapel#13560

Closed

jr200 mentioned this issue Sep 18, 2022

Discuss changing default behaviour of NaN pola-rs/polars#4885

Closed

Artem-B mentioned this issue Aug 16, 2023

NVPTX backend generating instructions only available on SM80 and above by default. llvm/llvm-project#64606

Open

simeonschaub mentioned this issue Sep 19, 2023

Make min/max use isless even for Real #51356

Open

should min(0.0, NaN) be NaN? #7866

should min(0.0, NaN) be NaN? #7866

Comments

StefanKarpinski commented Aug 6, 2014

ViralBShah commented Aug 6, 2014

cartazio commented Aug 6, 2014

StefanKarpinski commented Aug 6, 2014

JeffBezanson commented Aug 6, 2014

cartazio commented Aug 6, 2014

JeffBezanson commented Aug 6, 2014

cartazio commented Aug 6, 2014

section 5.3

section 6.2 Operations with NaNs 6.2.0

StefanKarpinski commented Aug 7, 2014

staticfloat commented Aug 7, 2014

cartazio commented Aug 7, 2014

JeffBezanson commented Aug 7, 2014

StefanKarpinski commented Aug 7, 2014

ArchRobison commented Aug 8, 2014

simonster commented Aug 9, 2014

ArchRobison commented Aug 11, 2014

ArchRobison commented Aug 11, 2014

cartazio commented Aug 11, 2014

ArchRobison commented Aug 11, 2014

porterjamesj commented Aug 13, 2014

cartazio commented Aug 13, 2014

cartazio commented Aug 18, 2014

ArchRobison commented Aug 22, 2014

cartazio commented Aug 22, 2014

JeffBezanson commented Aug 22, 2014

StefanKarpinski commented Aug 22, 2014

sunfishcode commented Sep 18, 2014

eschnett commented Sep 19, 2014

ArchRobison commented Sep 19, 2014

danluu commented Sep 19, 2014

StefanKarpinski commented Sep 13, 2016

simonbyrne commented Sep 14, 2016

StefanKarpinski commented Sep 14, 2016 • edited Loading

StefanKarpinski commented Sep 14, 2016

AgnerF commented Mar 1, 2018 • edited Loading

StefanKarpinski commented Mar 1, 2018

StefanKarpinski commented Sep 14, 2016 •

edited

Loading

AgnerF commented Mar 1, 2018 •

edited

Loading