Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

^(::Float64, ::Integer) incorrect subnormal results #19872

Closed
simonbyrne opened this issue Jan 5, 2017 · 11 comments · Fixed by #19890
Closed

^(::Float64, ::Integer) incorrect subnormal results #19872

simonbyrne opened this issue Jan 5, 2017 · 11 comments · Fixed by #19890
Labels
compiler:codegen Generation of LLVM IR and native code maths Mathematical functions regression Regression in behavior compared to a previous version
Milestone

Comments

@simonbyrne
Copy link
Contributor

simonbyrne commented Jan 5, 2017

For some reason powi seems to be flushing some subnormals to zero:

From 0.4 (correct behaviour):

julia> versioninfo()
Julia Version 0.4.6
Commit 2e358ce (2016-06-19 17:16 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

julia> 2.0^(-1022)
2.2250738585072014e-308

julia> 2.0^(-1023)
1.1125369292536007e-308

julia> 2.0^(-1024)
5.562684646268003e-309

On 0.5 (incorrect)


julia> versioninfo()
Julia Version 0.5.0
Commit 3c9d753 (2016-09-19 18:14 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.7.1 (ORCJIT, broadwell)

julia> 2.0^(-1022)
2.2250738585072014e-308

julia> 2.0^(-1023)
1.1125369292536007e-308

julia> 2.0^(-1024)
0.0

On master (incorrect)

julia> versioninfo()
Julia Version 0.6.0-dev.1898
Commit 2e0b00180* (2017-01-04 13:04 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin15.6.0)
  CPU: Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)

julia> 2.0^(-1022)
2.2250738585072014e-308

julia> 2.0^(-1023)
1.1125369292536007e-308

julia> 2.0^(-1024)
0.0
@simonbyrne simonbyrne added the maths Mathematical functions label Jan 5, 2017
@simonbyrne simonbyrne added this to the 0.6.0 milestone Jan 5, 2017
@simonbyrne
Copy link
Contributor Author

simonbyrne commented Jan 5, 2017

This shows the cause of the problem pretty clearly:

julia> foo(x) = x^-1024
foo (generic function with 1 method)

julia> foo(2.0)
0.0

julia> @code_native foo(2.0)
	.section	__TEXT,__text,regular,pure_instructions
Filename: REPL[1]
	pushq	%rbp
	movq	%rsp, %rbp
Source line: 1
	vmulsd	%xmm0, %xmm0, %xmm0
	vmulsd	%xmm0, %xmm0, %xmm0
	vmulsd	%xmm0, %xmm0, %xmm0
	vmulsd	%xmm0, %xmm0, %xmm0
	vmulsd	%xmm0, %xmm0, %xmm0
	vmulsd	%xmm0, %xmm0, %xmm0
	vmulsd	%xmm0, %xmm0, %xmm0
	vmulsd	%xmm0, %xmm0, %xmm0
	vmulsd	%xmm0, %xmm0, %xmm0
	vmulsd	%xmm0, %xmm0, %xmm0
	movabsq	$13188183008, %rax      ## imm = 0x31213B3E0
	vmovsd	(%rax), %xmm1           ## xmm1 = mem[0],zero
	vdivsd	%xmm0, %xmm1, %xmm0
	popq	%rbp
	retq

Basically, it's doing:

function foo(x)
    for i = 1:10
        x = x*x
    end
    return 1 / x
end

The last iteration of the loop overflows to give Inf, and so the result is 0.0.

@simonbyrne
Copy link
Contributor Author

simonbyrne commented Jan 5, 2017

Some related previous discussion of this issue is in #2741 and #8939: the behaviour seems to have changed since those issues were opened.

@simonbyrne
Copy link
Contributor Author

simonbyrne commented Jan 5, 2017

Even without overflow, we still have the issue with intermediate rounding:

julia> x = 1.1^100
13780.612339822364

julia> y = 1.1^100.0
13780.61233982238

julia> z = big(1.1)^100.0
1.37806123398223814535955233851624435942499531981498994361102154455141583435305e+04

julia> Float64(x-z)/eps(x)
-9.349379803462563

julia> Float64(y-z)/eps(y)
-0.3493798034625633

i.e. the powi version is out by 9.3 ulps

@JeffBezanson JeffBezanson added compiler:codegen Generation of LLVM IR and native code regression Regression in behavior compared to a previous version labels Jan 5, 2017
@simonbyrne
Copy link
Contributor Author

simonbyrne commented Jan 5, 2017

Basically if we want this to be accurate to < 1 ulp (which is a typical reasonable standard), the LLVM approach is fine for exponents between -1 and 4 3, inclusive. For all other exponents we should fall back on the libm powi, or convert it to a Float64 and call libm pow.

@simonbyrne
Copy link
Contributor Author

We could also try using fpmath metadata but that's out of my skill set.

@StefanKarpinski StefanKarpinski modified the milestones: 0.6.x, 0.6.0 Jan 5, 2017
@vtjnash
Copy link
Member

vtjnash commented Jan 5, 2017

to get back the exact v0.4 implementation, basically, we just need to do:

diff --git a/src/intrinsics.cpp b/src/intrinsics.cpp
index 7eedd9f..0125c79 100644
--- a/src/intrinsics.cpp
+++ b/src/intrinsics.cpp
@@ -1456,7 +1456,7 @@ static Value *emit_untyped_intrinsic(intrinsic f, Value *x, Value *y, Value *z,
         x = FP(x);
         y = JL_INT(y);
         Type *tx = x->getType(); // TODO: LLVM expects this to be i32
-#if JL_LLVM_VERSION >= 30600
+#if 0
         Type *ts[1] = { tx };
         Value *powi = Intrinsic::getDeclaration(jl_Module, Intrinsic::powi,
             ArrayRef<Type*>(ts));

If we want to conditionally use powi only when it will give the right results, we could do that with a white-list:

diff --git a/src/codegen.cpp b/src/codegen.cpp
index 83a80f0..2df6ae3 100644
--- a/src/codegen.cpp
+++ b/src/codegen.cpp
@@ -389,10 +389,8 @@ static Function *expect_func;
 static Function *jldlsym_func;
 static Function *jlnewbits_func;
 static Function *jltypeassert_func;
-#if JL_LLVM_VERSION < 30600
 static Function *jlpow_func;
 static Function *jlpowf_func;
-#endif
 //static Function *jlgetnthfield_func;
 static Function *jlgetnthfieldchecked_func;
 //static Function *jlsetnthfield_func;
@@ -5950,7 +5948,6 @@ static void init_julia_llvm_env(Module *m)
                          "jl_gc_diff_total_bytes", m);
     add_named_global(diff_gc_total_bytes_func, *jl_gc_diff_total_bytes);
 
-#if JL_LLVM_VERSION < 30600
     Type *powf_type[2] = { T_float32, T_float32 };
     jlpowf_func = Function::Create(FunctionType::get(T_float32, powf_type, false),
                                    Function::ExternalLinkage,
@@ -5968,7 +5965,7 @@ static void init_julia_llvm_env(Module *m)
         &pow,
 #endif
         false);
-#endif
+
     std::vector<Type*> array_owner_args(0);
     array_owner_args.push_back(T_pjlvalue);
     jlarray_data_owner_func =
diff --git a/src/intrinsics.cpp b/src/intrinsics.cpp
index 7eedd9f..82e1a09 100644
--- a/src/intrinsics.cpp
+++ b/src/intrinsics.cpp
@@ -1456,20 +1456,26 @@ static Value *emit_untyped_intrinsic(intrinsic f, Value *x, Value *y, Value *z,
         x = FP(x);
         y = JL_INT(y);
         Type *tx = x->getType(); // TODO: LLVM expects this to be i32
+        Function *powi = (tx == T_float64 ? jlpow_func : jlpowf_func);
 #if JL_LLVM_VERSION >= 30600
-        Type *ts[1] = { tx };
-        Value *powi = Intrinsic::getDeclaration(jl_Module, Intrinsic::powi,
-            ArrayRef<Type*>(ts));
+        if (ConstantInt *cy = dyn_cast<ConstantInt>(y)) {
+            if (cy->isMinusOne() || !cy->uge(5)) {
+                powi = Intrinsic::getDeclaration(jl_Module,
+                        Intrinsic::powi,
+                        makeArrayRef(tx));
+            }
+        }
+#endif
+        // issue #6506
+        if (!powi->isIntrinsic()) {
+            powi = static_cast<Function*>(prepare_call(powi));
+            y = builder.CreateSIToFP(y, tx);
+        }
 #if JL_LLVM_VERSION >= 30700
         return builder.CreateCall(powi, {x, y});
 #else
         return builder.CreateCall2(powi, x, y);
 #endif
-#else
-        // issue #6506
-        return builder.CreateCall2(prepare_call(tx == T_float64 ? jlpow_func : jlpowf_func),
-                x, builder.CreateSIToFP(y, tx));
-#endif
     }
     case sqrt_llvm_fast: {
         x = FP(x);

@JeffBezanson
Copy link
Member

Nice, that patch looks like a good way to go.

@vtjnash
Copy link
Member

vtjnash commented Jan 5, 2017

Alternatively, if we want to preserve all of the constant-folding abilities, we can do effectively the same, but as an LLVM pass scheduled near the end of our pipeline.

@vtjnash
Copy link
Member

vtjnash commented Jan 5, 2017

Basically, it's doing:

Note this is indeed the actual implementation (https://llvm.org/svn/llvm-project/compiler-rt/tags/Apple/Libcompiler_rt-14/lib/powidf2.c), not just similar to it. Its also defined this way regardless of compiler optimization. Per the spec for this function:

— Built-in Function: double __builtin_powi (double, int)
Returns the first argument raised to the power of the second. Unlike the pow function no guarantees about precision and rounding are made.
(https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html)

We could also do this branch at runtime, if that sounds worthwhile:

  switch %p powf [-1 inv, 0 one, 1 unity, 2 sqr, 3 triple ]
inv:
  1 / x
one:
  1
unity:
  x
square:
  x*x
triple:
  x*x*x
powf:
  powf(x, float(p))

@vtjnash
Copy link
Member

vtjnash commented Jan 5, 2017

Finally, I'll just note that the square and triple branch are unreachable there, since we already handle those in inference. Extending inference to handle the others (and removing / renaming this intrinsic powf_fast) should be pretty easy.

@vtjnash
Copy link
Member

vtjnash commented Jan 5, 2017

Extending inference to handle the others (and removing / renaming this intrinsic powf_fast) should be pretty easy.

scratch that. we can't handle this until we can infer function purity.

however, it turns out that LLVM already does all of the valid optimizations I proposed above, as long as you don't explicitly tell it to give you the low precision answer (as we do now). I'll put up a PR.

vtjnash added a commit that referenced this issue Jan 6, 2017
the powi intrinsic optimization is that it is inaccurate,
where it is equally accurate (e.g. tiny constant powers)
LLVM will already recongnize and optimize and call to a function named `powf`.

fix #19872
vtjnash added a commit that referenced this issue Jan 6, 2017
The powi intrinsic optimization over calling powf is that it is inaccurate,

When it is equally accurate (e.g. tiny constant powers)
LLVM will already recongnize and optimize any call to a function named `powf`,
and produce the same speedup.

fix #19872
vtjnash added a commit that referenced this issue Jan 6, 2017
The powi intrinsic optimization over calling powf is that it is inaccurate.

When it is equally accurate (e.g. tiny constant powers),
LLVM will already recognize and optimize any call to a function named `powf`,
and produce the same speedup.

fix #19872
vtjnash added a commit that referenced this issue Jan 6, 2017
The powi intrinsic optimization over calling powf is that it is inaccurate.
We don't need that.

When it is equally accurate (e.g. tiny constant powers),
LLVM will already recognize and optimize any call to a function named `powf`,
and produce the same speedup.

fix #19872
vtjnash added a commit that referenced this issue Jan 6, 2017
The powi intrinsic optimization over calling powf is that it is inaccurate.
We don't need that.

When it is equally accurate (e.g. tiny constant powers),
LLVM will already recognize and optimize any call to a function named `powf`,
and produce the same speedup.

fix #19872
vtjnash added a commit that referenced this issue Jan 23, 2017
The powi intrinsic optimization over calling powf is that it is inaccurate.
We don't need that.

When it is equally accurate (e.g. tiny constant powers),
LLVM will already recognize and optimize any call to a function named `powf`,
and produce the same speedup.

fix #19872
vtjnash added a commit that referenced this issue Jan 24, 2017
The powi intrinsic optimization over calling powf is that it is inaccurate.
We don't need that.

When it is equally accurate (e.g. tiny constant powers),
LLVM will already recognize and optimize any call to a function named `powf`,
and produce the same speedup.

fix #19872
vtjnash added a commit that referenced this issue Jan 27, 2017
The powi intrinsic optimization over calling powf is that it is inaccurate.
We don't need that.

When it is equally accurate (e.g. tiny constant powers),
LLVM will already recognize and optimize any call to a function named `powf`,
and produce the same speedup.

fix #19872
vtjnash added a commit that referenced this issue Jan 30, 2017
The powi intrinsic optimization over calling powf is that it is inaccurate.
We don't need that.

When it is equally accurate (e.g. tiny constant powers),
LLVM will already recognize and optimize any call to a function named `powf`,
and produce the same speedup.

fix #19872
vtjnash added a commit that referenced this issue Feb 22, 2017
The powi intrinsic optimization over calling powf is that it is inaccurate.
We don't need that.

When it is equally accurate (e.g. tiny constant powers),
LLVM will already recognize and optimize any call to a function named `powf`,
and produce the same speedup.

fix #19872
vtjnash added a commit that referenced this issue Feb 22, 2017
The powi intrinsic optimization over calling powf is that it is inaccurate.
We don't need that.

When it is equally accurate (e.g. tiny constant powers),
LLVM will already recognize and optimize any call to a function named `powf`,
and produce the same speedup.

fix #19872
stevengj pushed a commit to stevengj/julia that referenced this issue Feb 24, 2017
The powi intrinsic optimization over calling powf is that it is inaccurate.
We don't need that.

When it is equally accurate (e.g. tiny constant powers),
LLVM will already recognize and optimize any call to a function named `powf`,
and produce the same speedup.

fix JuliaLang#19872
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code maths Mathematical functions regression Regression in behavior compared to a previous version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants