^(::Float64, ::Integer) incorrect subnormal results #19872

simonbyrne · 2017-01-05T10:20:08Z

For some reason powi seems to be flushing some subnormals to zero:

From 0.4 (correct behaviour):

julia> versioninfo()
Julia Version 0.4.6
Commit 2e358ce (2016-06-19 17:16 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

julia> 2.0^(-1022)
2.2250738585072014e-308

julia> 2.0^(-1023)
1.1125369292536007e-308

julia> 2.0^(-1024)
5.562684646268003e-309

On 0.5 (incorrect)


julia> versioninfo()
Julia Version 0.5.0
Commit 3c9d753 (2016-09-19 18:14 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.7.1 (ORCJIT, broadwell)

julia> 2.0^(-1022)
2.2250738585072014e-308

julia> 2.0^(-1023)
1.1125369292536007e-308

julia> 2.0^(-1024)
0.0

On master (incorrect)

julia> versioninfo()
Julia Version 0.6.0-dev.1898
Commit 2e0b00180* (2017-01-04 13:04 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin15.6.0)
  CPU: Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)

julia> 2.0^(-1022)
2.2250738585072014e-308

julia> 2.0^(-1023)
1.1125369292536007e-308

julia> 2.0^(-1024)
0.0

The text was updated successfully, but these errors were encountered:

simonbyrne · 2017-01-05T11:00:34Z

This shows the cause of the problem pretty clearly:

julia> foo(x) = x^-1024
foo (generic function with 1 method)

julia> foo(2.0)
0.0

julia> @code_native foo(2.0)
	.section	__TEXT,__text,regular,pure_instructions
Filename: REPL[1]
	pushq	%rbp
	movq	%rsp, %rbp
Source line: 1
	vmulsd	%xmm0, %xmm0, %xmm0
	vmulsd	%xmm0, %xmm0, %xmm0
	vmulsd	%xmm0, %xmm0, %xmm0
	vmulsd	%xmm0, %xmm0, %xmm0
	vmulsd	%xmm0, %xmm0, %xmm0
	vmulsd	%xmm0, %xmm0, %xmm0
	vmulsd	%xmm0, %xmm0, %xmm0
	vmulsd	%xmm0, %xmm0, %xmm0
	vmulsd	%xmm0, %xmm0, %xmm0
	vmulsd	%xmm0, %xmm0, %xmm0
	movabsq	$13188183008, %rax      ## imm = 0x31213B3E0
	vmovsd	(%rax), %xmm1           ## xmm1 = mem[0],zero
	vdivsd	%xmm0, %xmm1, %xmm0
	popq	%rbp
	retq

Basically, it's doing:

function foo(x)
    for i = 1:10
        x = x*x
    end
    return 1 / x
end

The last iteration of the loop overflows to give Inf, and so the result is 0.0.

simonbyrne · 2017-01-05T11:02:52Z

Some related previous discussion of this issue is in #2741 and #8939: the behaviour seems to have changed since those issues were opened.

simonbyrne · 2017-01-05T11:25:28Z

Even without overflow, we still have the issue with intermediate rounding:

julia> x = 1.1^100
13780.612339822364

julia> y = 1.1^100.0
13780.61233982238

julia> z = big(1.1)^100.0
1.37806123398223814535955233851624435942499531981498994361102154455141583435305e+04

julia> Float64(x-z)/eps(x)
-9.349379803462563

julia> Float64(y-z)/eps(y)
-0.3493798034625633

i.e. the powi version is out by 9.3 ulps

simonbyrne · 2017-01-05T18:39:07Z

Basically if we want this to be accurate to < 1 ulp (which is a typical reasonable standard), the LLVM approach is fine for exponents between -1 and 4 3, inclusive. For all other exponents we should fall back on the libm powi, or convert it to a Float64 and call libm pow.

simonbyrne · 2017-01-05T18:39:47Z

We could also try using fpmath metadata but that's out of my skill set.

vtjnash · 2017-01-05T18:53:12Z

to get back the exact v0.4 implementation, basically, we just need to do:

diff --git a/src/intrinsics.cpp b/src/intrinsics.cpp
index 7eedd9f..0125c79 100644
--- a/src/intrinsics.cpp
+++ b/src/intrinsics.cpp
@@ -1456,7 +1456,7 @@ static Value *emit_untyped_intrinsic(intrinsic f, Value *x, Value *y, Value *z,
         x = FP(x);
         y = JL_INT(y);
         Type *tx = x->getType(); // TODO: LLVM expects this to be i32
-#if JL_LLVM_VERSION >= 30600
+#if 0
         Type *ts[1] = { tx };
         Value *powi = Intrinsic::getDeclaration(jl_Module, Intrinsic::powi,
             ArrayRef<Type*>(ts));

If we want to conditionally use powi only when it will give the right results, we could do that with a white-list:

diff --git a/src/codegen.cpp b/src/codegen.cpp
index 83a80f0..2df6ae3 100644
--- a/src/codegen.cpp
+++ b/src/codegen.cpp
@@ -389,10 +389,8 @@ static Function *expect_func;
 static Function *jldlsym_func;
 static Function *jlnewbits_func;
 static Function *jltypeassert_func;
-#if JL_LLVM_VERSION < 30600
 static Function *jlpow_func;
 static Function *jlpowf_func;
-#endif
 //static Function *jlgetnthfield_func;
 static Function *jlgetnthfieldchecked_func;
 //static Function *jlsetnthfield_func;
@@ -5950,7 +5948,6 @@ static void init_julia_llvm_env(Module *m)
                          "jl_gc_diff_total_bytes", m);
     add_named_global(diff_gc_total_bytes_func, *jl_gc_diff_total_bytes);
 
-#if JL_LLVM_VERSION < 30600
     Type *powf_type[2] = { T_float32, T_float32 };
     jlpowf_func = Function::Create(FunctionType::get(T_float32, powf_type, false),
                                    Function::ExternalLinkage,
@@ -5968,7 +5965,7 @@ static void init_julia_llvm_env(Module *m)
         &pow,
 #endif
         false);
-#endif
+
     std::vector<Type*> array_owner_args(0);
     array_owner_args.push_back(T_pjlvalue);
     jlarray_data_owner_func =
diff --git a/src/intrinsics.cpp b/src/intrinsics.cpp
index 7eedd9f..82e1a09 100644
--- a/src/intrinsics.cpp
+++ b/src/intrinsics.cpp
@@ -1456,20 +1456,26 @@ static Value *emit_untyped_intrinsic(intrinsic f, Value *x, Value *y, Value *z,
         x = FP(x);
         y = JL_INT(y);
         Type *tx = x->getType(); // TODO: LLVM expects this to be i32
+        Function *powi = (tx == T_float64 ? jlpow_func : jlpowf_func);
 #if JL_LLVM_VERSION >= 30600
-        Type *ts[1] = { tx };
-        Value *powi = Intrinsic::getDeclaration(jl_Module, Intrinsic::powi,
-            ArrayRef<Type*>(ts));
+        if (ConstantInt *cy = dyn_cast<ConstantInt>(y)) {
+            if (cy->isMinusOne() || !cy->uge(5)) {
+                powi = Intrinsic::getDeclaration(jl_Module,
+                        Intrinsic::powi,
+                        makeArrayRef(tx));
+            }
+        }
+#endif
+        // issue #6506
+        if (!powi->isIntrinsic()) {
+            powi = static_cast<Function*>(prepare_call(powi));
+            y = builder.CreateSIToFP(y, tx);
+        }
 #if JL_LLVM_VERSION >= 30700
         return builder.CreateCall(powi, {x, y});
 #else
         return builder.CreateCall2(powi, x, y);
 #endif
-#else
-        // issue #6506
-        return builder.CreateCall2(prepare_call(tx == T_float64 ? jlpow_func : jlpowf_func),
-                x, builder.CreateSIToFP(y, tx));
-#endif
     }
     case sqrt_llvm_fast: {
         x = FP(x);

JeffBezanson · 2017-01-05T20:13:55Z

Nice, that patch looks like a good way to go.

vtjnash · 2017-01-05T21:08:02Z

Alternatively, if we want to preserve all of the constant-folding abilities, we can do effectively the same, but as an LLVM pass scheduled near the end of our pipeline.

vtjnash · 2017-01-05T22:43:43Z

Basically, it's doing:

Note this is indeed the actual implementation (https://llvm.org/svn/llvm-project/compiler-rt/tags/Apple/Libcompiler_rt-14/lib/powidf2.c), not just similar to it. Its also defined this way regardless of compiler optimization. Per the spec for this function:

— Built-in Function: double __builtin_powi (double, int)
Returns the first argument raised to the power of the second. Unlike the pow function no guarantees about precision and rounding are made.
(https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html)

We could also do this branch at runtime, if that sounds worthwhile:

  switch %p powf [-1 inv, 0 one, 1 unity, 2 sqr, 3 triple ]
inv:
  1 / x
one:
  1
unity:
  x
square:
  x*x
triple:
  x*x*x
powf:
  powf(x, float(p))

vtjnash · 2017-01-05T23:00:14Z

Finally, I'll just note that the square and triple branch are unreachable there, since we already handle those in inference. Extending inference to handle the others (and removing / renaming this intrinsic powf_fast) should be pretty easy.

vtjnash · 2017-01-05T23:55:51Z

Extending inference to handle the others (and removing / renaming this intrinsic powf_fast) should be pretty easy.

scratch that. we can't handle this until we can infer function purity.

however, it turns out that LLVM already does all of the valid optimizations I proposed above, as long as you don't explicitly tell it to give you the low precision answer (as we do now). I'll put up a PR.

the powi intrinsic optimization is that it is inaccurate, where it is equally accurate (e.g. tiny constant powers) LLVM will already recongnize and optimize and call to a function named `powf`. fix #19872

The powi intrinsic optimization over calling powf is that it is inaccurate, When it is equally accurate (e.g. tiny constant powers) LLVM will already recongnize and optimize any call to a function named `powf`, and produce the same speedup. fix #19872

The powi intrinsic optimization over calling powf is that it is inaccurate. When it is equally accurate (e.g. tiny constant powers), LLVM will already recognize and optimize any call to a function named `powf`, and produce the same speedup. fix #19872

The powi intrinsic optimization over calling powf is that it is inaccurate. We don't need that. When it is equally accurate (e.g. tiny constant powers), LLVM will already recognize and optimize any call to a function named `powf`, and produce the same speedup. fix #19872

The powi intrinsic optimization over calling powf is that it is inaccurate. We don't need that. When it is equally accurate (e.g. tiny constant powers), LLVM will already recognize and optimize any call to a function named `powf`, and produce the same speedup. fix JuliaLang#19872

simonbyrne added the maths Mathematical functions label Jan 5, 2017

simonbyrne added this to the 0.6.0 milestone Jan 5, 2017

JeffBezanson added compiler:codegen Generation of LLVM IR and native code regression Regression in behavior compared to a previous version labels Jan 5, 2017

StefanKarpinski modified the milestones: 0.6.x, 0.6.0 Jan 5, 2017

vtjnash mentioned this issue Jan 6, 2017

only use accurate powf function #19890

Merged

vtjnash closed this as completed in #19890 Feb 24, 2017

yuyichao mentioned this issue Sep 18, 2017

40x Performance Regression in v0.6 for simple polynomial function #23751

Closed

KristofferC mentioned this issue Sep 21, 2017

Surprising performance for ^(::Float64, ::Int) vs ^(::Complex128, ::Int) #23804

Closed

vtjnash mentioned this issue Oct 26, 2017

make x^-n equivalent to inv(x)^n for literal n #24240

Merged

KristofferC mentioned this issue Aug 28, 2021

^(::Float, ::Integer) #42031

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

^(::Float64, ::Integer) incorrect subnormal results #19872

^(::Float64, ::Integer) incorrect subnormal results #19872

simonbyrne commented Jan 5, 2017 •

edited

Loading

simonbyrne commented Jan 5, 2017 •

edited

Loading

simonbyrne commented Jan 5, 2017 •

edited

Loading

simonbyrne commented Jan 5, 2017 •

edited

Loading

simonbyrne commented Jan 5, 2017 •

edited

Loading

simonbyrne commented Jan 5, 2017

vtjnash commented Jan 5, 2017

JeffBezanson commented Jan 5, 2017

vtjnash commented Jan 5, 2017

vtjnash commented Jan 5, 2017

vtjnash commented Jan 5, 2017

vtjnash commented Jan 5, 2017

^(::Float64, ::Integer) incorrect subnormal results #19872

^(::Float64, ::Integer) incorrect subnormal results #19872

Comments

simonbyrne commented Jan 5, 2017 • edited Loading

simonbyrne commented Jan 5, 2017 • edited Loading

simonbyrne commented Jan 5, 2017 • edited Loading

simonbyrne commented Jan 5, 2017 • edited Loading

simonbyrne commented Jan 5, 2017 • edited Loading

simonbyrne commented Jan 5, 2017

vtjnash commented Jan 5, 2017

JeffBezanson commented Jan 5, 2017

vtjnash commented Jan 5, 2017

vtjnash commented Jan 5, 2017

vtjnash commented Jan 5, 2017

vtjnash commented Jan 5, 2017

simonbyrne commented Jan 5, 2017 •

edited

Loading

simonbyrne commented Jan 5, 2017 •

edited

Loading

simonbyrne commented Jan 5, 2017 •

edited

Loading

simonbyrne commented Jan 5, 2017 •

edited

Loading

simonbyrne commented Jan 5, 2017 •

edited

Loading