Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consistent 32bit CI error #29923

Closed
Keno opened this issue Nov 4, 2018 · 25 comments
Closed

consistent 32bit CI error #29923

Keno opened this issue Nov 4, 2018 · 25 comments
Labels
ci Continuous integration priority This should be addressed urgently system:linux Affects only Linux system:32-bit Affects only 32-bit systems

Comments

@Keno
Copy link
Member

Keno commented Nov 4, 2018

This seems to have started recently

Error During Test at /tmp/julia/share/julia/test/numbers.jl:2329
  Got exception outside of a @test
  return type Int128 does not match inferred return type Any
  Stacktrace:
   [1] error(::String) at ./error.jl:33
   [2] top-level scope at /tmp/julia/share/julia/test/numbers.jl:2372
   [3] top-level scope at /home/travis/build/JuliaLang/julia/usr/share/julia/stdlib/v1.1/Test/src/Test.jl:1083
   [4] top-level scope at /tmp/julia/share/julia/test/numbers.jl:2330
   [5] include at ./boot.jl:317 [inlined]
   [6] include_relative(::Module, ::String) at ./loading.jl:1038
   [7] include at ./sysimg.jl:29 [inlined]
   [8] include(::String) at /tmp/julia/share/julia/test/testdefs.jl:13
   [9] top-level scope at /tmp/julia/share/julia/test/testdefs.jl:22
   [10] top-level scope at /home/travis/build/JuliaLang/julia/usr/share/julia/stdlib/v1.1/Test/src/Test.jl:1083
   [11] top-level scope at /tmp/julia/share/julia/test/testdefs.jl:21
   [12] top-level scope at util.jl:289
   [13] top-level scope at /tmp/julia/share/julia/test/testdefs.jl:19
   [14] eval at ./boot.jl:319 [inlined]
   [15] #runtests#3(::UInt128, ::Function, ::String, ::String, ::Bool) at /tmp/julia/share/julia/test/testdefs.jl:25
   [16] #runtests at ./none:0 [inlined] (repeats 2 times)
   [17] (::getfield(Distributed, Symbol("##112#114")){Distributed.CallMsg{:call_fetch}})() at /home/travis/build/JuliaLang/julia/usr/share/julia/stdlib/v1.1/Distributed/src/process_messages.jl:269
   [18] run_work_thunk(::getfield(Distributed, Symbol("##112#114")){Distributed.CallMsg{:call_fetch}}, ::Bool) at /home/travis/build/JuliaLang/julia/usr/share/julia/stdlib/v1.1/Distributed/src/process_messages.jl:56
   [19] macro expansion at /home/travis/build/JuliaLang/julia/usr/share/julia/stdlib/v1.1/Distributed/src/process_messages.jl:269 [inlined]
   [20] (::getfield(Distributed, Symbol("##111#113"))

e.g. https://travis-ci.org/JuliaLang/julia/jobs/450035986

@Keno Keno added priority This should be addressed urgently ci Continuous integration labels Nov 4, 2018
@JeffBezanson
Copy link
Member

@ararslan ararslan added system:linux Affects only Linux system:32-bit Affects only 32-bit systems labels Nov 4, 2018
@c42f
Copy link
Member

c42f commented Nov 5, 2018

I just got another of these.

@c42f
Copy link
Member

c42f commented Nov 5, 2018

This is reproducible for me on an i686 build from a 64 bit ubuntu 18.04 host. The bug reduces to:

julia> Base.return_types(%, (Int8, UInt128))
1-element Array{Any,1}:
 Any

@c42f
Copy link
Member

c42f commented Nov 5, 2018

Well, it makes absolutely no sense, but reverting e2f7b2f fixes this for me locally.

@c42f
Copy link
Member

c42f commented Nov 5, 2018

More precisely, reverting only the following line seems to fix the problem, but I don't understand why.

diff --git a/base/array.jl b/base/array.jl
index 9e8da6c384..74823710ab 100644
--- a/base/array.jl
+++ b/base/array.jl
@@ -272,7 +272,6 @@ function copyto!(dest::Array{T}, doffs::Integer, src::Array{T}, soffs::Integer,
         throw(BoundsError())
     end
     unsafe_copyto!(dest, doffs, src, soffs, n)
-    return dest
 end
 
 # Outlining this because otherwise a catastrophic inference slowdown

Edit: having woken my machine up from sleep state, this one line change no longer "fixes" the problem. Which makes sense if there's something nasty and intermittent going on here. Uninitialized memory read or some such thing, perhaps.

@c42f
Copy link
Member

c42f commented Nov 5, 2018

It looks like my machine has fallen out of whatever state it was in which allowed me to reproduce this.

What I did when reproducing was to build 0999799 with ARCH=i686 (on ubuntu 18.04), which gave the return_types failure as stated. Then to revert various recent commits and measure the bug with Revise until I found a change which "fixed" the problem.

This was yesterday, but the build of 0999799 no longer exhibits the problem today. I'll chalk that up to it being intermittent (I don't think it's just me loosing my mind :-) ).

@c42f
Copy link
Member

c42f commented Nov 8, 2018

Got another window of reproducibility; it looks like the cached inferred code for rem(::UInt128, ::UInt128) has a return type of Any, which may possibly be the underlying cause though I'm not sure.

Applying the following patch:

diff --git a/base/compiler/typeinfer.jl b/base/compiler/typeinfer.jl
index 01fb88b024..e1912b71da 100644
--- a/base/compiler/typeinfer.jl
+++ b/base/compiler/typeinfer.jl
@@ -594,6 +594,11 @@ function typeinf_type(method::Method, @nospecialize(atypes), sparams::SimpleVect
             inf = code.inferred
             if !isa(inf, CodeInfo) || (inf::CodeInfo).inferred
                 i == 2 && ccall(:jl_typeinf_end, Cvoid, ())
+                println("typeinf_type code = ", code)
+                println("typeinf_type ", method, " code.rettype = ", code.rettype)
+                println("typeinf_type ", code.inferred)
+                println("typeinf_type ", ccall(:jl_uncompress_ast, Any, (Any, Any), method,
+                                               code.inferred::Vector{UInt8}))
                 return code.rettype
             end
         end

I get the following output:

julia> using Revise; Revise.track(Base.Core.Compiler);

julia> Base.return_types(rem, (UInt128, UInt128))
typeinf_type code = rem(UInt128, UInt128)
typeinf_type Base.rem(...) code.rettype = Any
typeinf_type Array{UInt8, (105,)}[0x08, 0x03, 0x00, 0x00, 0x00, 0x23, 0x73, 0x65, 0x6c, 0x66, 0x23, 0x00, 0x78, 0x00, 0x79, 0x00, 0x16, 0x84, 0xb0, 0xea, 0x09, 0x03, 0x3c, 0x11, 0x02, 0x1d, 0x11, 0x00, 0x05, 0x02, 0x00, 0x09, 0x03, 0x3c, 0x11, 0x02, 0x1d, 0x11, 0x00, 0x05, 0x03, 0x00, 0x2d, 0x1d, 0x85, 0x03, 0x01, 0x03, 0x02, 0x2c, 0x1d, 0x1c, 0xf4, 0x03, 0x03, 0x09, 0x01, 0x3e, 0x03, 0x04, 0x39, 0x16, 0x84, 0xb0, 0xea, 0x3a, 0x3a, 0x3a, 0x3a, 0x3a, 0x16, 0x84, 0xac, 0x26, 0x00, 0x2e, 0x2e, 0x20, 0x85, 0x4d, 0x2b, 0x0a, 0x03, 0xab, 0x16, 0x01, 0xb0, 0xf6, 0x00, 0x00, 0x00, 0x00, 0x00, 0x16, 0x01, 0xae, 0xf6, 0x00, 0x00, 0x00, 0x01, 0x01, 0x01, 0x01, 0x01]
typeinf_type Core.CodeInfo(code=Array{Any, (5,)}[
  Expr(:invoke, (::Type{Base.GMP.BigInt})(UInt128), Base.BigInt, Core.SlotNumber(id=2)),
  Expr(:invoke, (::Type{Base.GMP.BigInt})(UInt128), Base.BigInt, Core.SlotNumber(id=3)),
  Expr(:call, Base.rem, SSAValue(1), SSAValue(2)),
  Expr(:call, Base.UInt128, SSAValue(3)),
  Expr(:return, SSAValue(4))], codelocs=Array{Int32, (5,)}[1, 1, 1, 1, 1], method_for_inference_limit_heuristics=nothing, ssavaluetypes=Array{Any, (5,)}[
  Any,
  Any,
  Any,
  Any,
  Any], linetable=Array{Core.LineInfoNode, (1,)}[Core.LineInfoNode(mod=Base, method=:rem, file=Symbol("int.jl"), line=778, inlined_at=0)], ssaflags=Array{UInt8, (5,)}[0x00, 0x00, 0x00, 0x00, 0x00], slotflags=Array{UInt8, (3,)}[0x00, 0x00, 0x00], slotnames=Array{Any, (3,)}[
  Symbol("#self#"),
  :x,
  :y], inferred=true, inlineable=false, propagate_inbounds=false, pure=false)
1-element Array{Any,1}:
 Any

Given these BigInt in here, perhaps this comes down to a weird interaction with the patch in 6fa3e45

@c42f
Copy link
Member

c42f commented Nov 8, 2018

Given these BigInt in here, perhaps this comes down to a weird interaction with the patch in 6fa3e45

This would also explain why it's not seen on 64 bit linux, as we don't use GMP for supporting UInt128 in that case.

@c42f
Copy link
Member

c42f commented Nov 8, 2018

Here's another oddity - @code_typed and Base.return_types disagree about rem with UInt128:

julia> @code_typed rem(UInt128(1), UInt128(1))
CodeInfo(
778 1%1 = invoke Base.BigInt(_2::UInt128)::BigInt                  │ 
    │   %2 = invoke Base.BigInt(_3::UInt128)::BigInt                  │ 
    │   %3 = Base.GMP.MPZ.tdiv_r::typeof(tdiv_r)                      │╻ rem
    │   %4 = invoke %3(%1::BigInt, %2::BigInt)::BigInt                ││
    │   %5 = invoke Base.UInt128(%4::BigInt)::UInt128                 │ 
    └──      return %5                                                │ 
) => UInt128

julia> Base.return_types(rem, (UInt128, UInt128))
1-element Array{Any,1}:
 Any

@c42f
Copy link
Member

c42f commented Nov 8, 2018

More fun:

$ ./julia

[...]

julia> Base.return_types(rem, (UInt128, UInt128))
1-element Array{Any,1}:
 Any

# Now replace `rem` with exactly the same definition, to cause recompilation: 
julia> Base.eval(:(
           function rem(x::UInt128, y::UInt128)
               return UInt128(rem(BigInt(x), BigInt(y)))
           end
       ))
rem (generic function with 136 methods)

julia> Base.return_types(rem, (UInt128, UInt128))
1-element Array{Any,1}:
 UInt128

So I guess the incorrect return type for rem is being read from the sysimg. Later on we get this incorrect version deserialized, but if we override it with the exact same function definition, the next inference run returns the correct result.

@c42f
Copy link
Member

c42f commented Nov 8, 2018

The following patch dumps inferred rettype for rem as it's cached in jl_set_method_inferred.

diff --git a/src/gf.c b/src/gf.c
index ae4a1413b7..414a52e369 100644
--- a/src/gf.c
+++ b/src/gf.c
@@ -432,6 +432,13 @@ JL_DLLEXPORT jl_method_instance_t* jl_set_method_inferred(
     li->functionObjectsDecls.functionObject = NULL;
     li->functionObjectsDecls.specFunctionObject = NULL;
     li->rettype = rettype;
+    if (li->def.method->name == jl_symbol("rem")) {
+        jl_printf(JL_STDOUT, "jl_set_method_inferred rem; rettype = ");
+        jl_static_show(JL_STDOUT, li->rettype);
+        jl_printf(JL_STDOUT, "   ");
+        jl_static_show(JL_STDOUT, li->specTypes);
+        jl_printf(JL_STDOUT, "\n");
+    }
     jl_gc_wb(li, rettype);
     li->inferred = inferred;
     jl_gc_wb(li, inferred);

It demonstrates that the 32 bit build sometimes infers the wrong type for rem(UInt128, UInt128). In particular, applying it and calling

make clean &> /dev/null && make &> build.log && grep 'jl_set_method_inferred.*rettype = .*UInt128, UInt128' build.log

sometimes produces

jl_set_method_inferred rem; rettype = Any   Tuple{typeof(Base.rem), UInt128, UInt128}
jl_set_method_inferred rem; rettype = Any   Tuple{typeof(Base.rem), UInt128, UInt128}

and other times produces

jl_set_method_inferred rem; rettype = Any   Tuple{typeof(Base.rem), UInt128, UInt128}
jl_set_method_inferred rem; rettype = UInt128   Tuple{typeof(Base.rem), UInt128, UInt128}

I'm not sure whether this depends on being inside the sysimg build or not.

@c42f
Copy link
Member

c42f commented Nov 8, 2018

Ok, to reproduce this, you need an i686 build, apply the patch above and to run the sysimg build step several times until you see the uninferred version of rem

jl_set_method_inferred rem; rettype = Any   Tuple{typeof(Base.rem), UInt128, UInt128}

twice in the log.

Another change which might have caused this to appear - #29795

@StefanKarpinski
Copy link
Member

Nice work tracking this down, @c42f! @Keno, @vtjnash, @JeffBezanson—any idea what's going on?

@c42f
Copy link
Member

c42f commented Nov 8, 2018

Thanks, it's a head scratcher that's for sure. I caught the following stack trace from inside jl_set_method_inferred, just as it sets the bad Any as the rettype. This is still slightly after the damage is done, but almost at the right location. At a high level it's inside jl_compile_hint, so just before the sysimg build is completed.

jl_set_method_inferred rem; rettype = Any   Tuple{typeof(Base.rem), UInt128, UInt128}
rec_backtrace at /home/tcfoster/src/julia-i686/src/stackwalk.c:94
jlbacktrace2 at /home/tcfoster/src/julia-i686/src/stackwalk.c:536
jl_set_method_inferred at /home/tcfoster/src/julia-i686/src/gf.c:441
cache_result at ./compiler/typeinfer.jl:133
typeinf at ./compiler/typeinfer.jl:61
typeinf_edge at ./compiler/typeinfer.jl:499 [inlined]
abstract_call_method at ./compiler/abstractinterpretation.jl:363
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:85
abstract_call at ./compiler/abstractinterpretation.jl:775
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_apply at ./compiler/abstractinterpretation.jl:518
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call at ./compiler/abstractinterpretation.jl:566
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval_call at ./compiler/abstractinterpretation.jl:804
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval at ./compiler/abstractinterpretation.jl:889
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_local at ./compiler/abstractinterpretation.jl:1134
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1190
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf at ./compiler/typeinfer.jl:15
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_edge at ./compiler/typeinfer.jl:499
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_method at ./compiler/abstractinterpretation.jl:363
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:85
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call at ./compiler/abstractinterpretation.jl:775
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval_call at ./compiler/abstractinterpretation.jl:804
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval at ./compiler/abstractinterpretation.jl:889
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_local at ./compiler/abstractinterpretation.jl:1134
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1190
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf at ./compiler/typeinfer.jl:15
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_edge at ./compiler/typeinfer.jl:499
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_method at ./compiler/abstractinterpretation.jl:363
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:85
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call at ./compiler/abstractinterpretation.jl:775
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval_call at ./compiler/abstractinterpretation.jl:804
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval at ./compiler/abstractinterpretation.jl:889
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_local at ./compiler/abstractinterpretation.jl:1134
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1190
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf at ./compiler/typeinfer.jl:15
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_edge at ./compiler/typeinfer.jl:499
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_method at ./compiler/abstractinterpretation.jl:363
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:85
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call at ./compiler/abstractinterpretation.jl:775
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval_call at ./compiler/abstractinterpretation.jl:804
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval at ./compiler/abstractinterpretation.jl:889
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_local at ./compiler/abstractinterpretation.jl:1134
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1190
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf at ./compiler/typeinfer.jl:15
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_edge at ./compiler/typeinfer.jl:499
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_method at ./compiler/abstractinterpretation.jl:363
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:85
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call at ./compiler/abstractinterpretation.jl:775
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval_call at ./compiler/abstractinterpretation.jl:804
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval at ./compiler/abstractinterpretation.jl:889
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_local at ./compiler/abstractinterpretation.jl:1134
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1190
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf at ./compiler/typeinfer.jl:15
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_edge at ./compiler/typeinfer.jl:499
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_method at ./compiler/abstractinterpretation.jl:363
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:85
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call at ./compiler/abstractinterpretation.jl:775
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval_call at ./compiler/abstractinterpretation.jl:804
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval at ./compiler/abstractinterpretation.jl:889
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_local at ./compiler/abstractinterpretation.jl:1134
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1190
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf at ./compiler/typeinfer.jl:15
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_edge at ./compiler/typeinfer.jl:499
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_method at ./compiler/abstractinterpretation.jl:363
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:85
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call at ./compiler/abstractinterpretation.jl:775
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval_call at ./compiler/abstractinterpretation.jl:804
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval at ./compiler/abstractinterpretation.jl:889
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_local at ./compiler/abstractinterpretation.jl:1134
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1190
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf at ./compiler/typeinfer.jl:15
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_edge at ./compiler/typeinfer.jl:499
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_method at ./compiler/abstractinterpretation.jl:363
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:85
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call at ./compiler/abstractinterpretation.jl:775
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval_call at ./compiler/abstractinterpretation.jl:804
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval at ./compiler/abstractinterpretation.jl:889
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_local at ./compiler/abstractinterpretation.jl:1120
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1190
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf at ./compiler/typeinfer.jl:15
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_method_with_const_args at ./compiler/abstractinterpretation.jl:198
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:107
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call at ./compiler/abstractinterpretation.jl:775
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval_call at ./compiler/abstractinterpretation.jl:804
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval at ./compiler/abstractinterpretation.jl:889
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_local at ./compiler/abstractinterpretation.jl:1134
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1190
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf at ./compiler/typeinfer.jl:15
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_method_with_const_args at ./compiler/abstractinterpretation.jl:198
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:107
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call at ./compiler/abstractinterpretation.jl:775
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval_call at ./compiler/abstractinterpretation.jl:804
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval at ./compiler/abstractinterpretation.jl:889
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_local at ./compiler/abstractinterpretation.jl:1120
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1190
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf at ./compiler/typeinfer.jl:15
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_edge at ./compiler/typeinfer.jl:499
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_method at ./compiler/abstractinterpretation.jl:363
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:85
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call at ./compiler/abstractinterpretation.jl:775
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval_call at ./compiler/abstractinterpretation.jl:804
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval at ./compiler/abstractinterpretation.jl:889
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_local at ./compiler/abstractinterpretation.jl:1134
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1190
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf at ./compiler/typeinfer.jl:15
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_edge at ./compiler/typeinfer.jl:499
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_method at ./compiler/abstractinterpretation.jl:363
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:85
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call at ./compiler/abstractinterpretation.jl:775
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval_call at ./compiler/abstractinterpretation.jl:804
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval at ./compiler/abstractinterpretation.jl:889
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_local at ./compiler/abstractinterpretation.jl:1134
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1190
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf at ./compiler/typeinfer.jl:15
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_ext at ./compiler/typeinfer.jl:574
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_ext at ./compiler/typeinfer.jl:611
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
jl_apply at /home/tcfoster/src/julia-i686/src/julia.h:1571 [inlined]
jl_type_infer at /home/tcfoster/src/julia-i686/src/gf.c:277
jl_compile_hint at /home/tcfoster/src/julia-i686/src/gf.c:1964
jl_compile_specializations at /home/tcfoster/src/julia-i686/src/precompile.c:378
jl_precompile at /home/tcfoster/src/julia-i686/src/precompile.c:388 [inlined]
jl_write_compiler_output at /home/tcfoster/src/julia-i686/src/precompile.c:34
jl_atexit_hook at /home/tcfoster/src/julia-i686/src/init.c:222
main at /home/tcfoster/src/julia-i686/ui/repl.c:222
__libc_start_main at /lib/i386-linux-gnu/libc.so.6 (unknown line)
_start at /home/tcfoster/src/julia-i686/usr/bin/julia (unknown line)

@StefanKarpinski
Copy link
Member

This is no longer what I'd describe as "intermittent"—it seems to happen quite consistently now.

@martinholters
Copy link
Member

I knew I remembered something like this. Finally I've found it: #25804 exposed a similar bug. The "solution" was a simple workaround: c002de7. Maybe we can just do the same for rem?

@martinholters
Copy link
Member

I think I have a handle on this one: On 32bit systems, rem(::UInt128, ::Untt128) and div(::UInt128, ::Untt128) convert to BigInt, have the actual computation done, and convert back to UInt128. However, that leads to the following recursion:
BigInt(::UInt128)ndigits(::UInt128, base=2)ndigits0z(::UInt128, 2)ndigits0zpb(::UInt128, 2)div(::UInt128, 2) → promotion → div(::UInt128, ::UInt128)BigInt(::UInt128)
It should be noted that ndigits0zpb has a special case for b==2, so the call-chain at run-time is not recursive, but apparently constant-propagation doesn't resolve this at compile-time.

To verify, I've added

function BigInt(x::UInt128)
    x == 0 && return BigInt(Culong(0))
    nd = sizeof(x)<<3 - leading_zeros(x) # instead of ndigits(x, base=2)
    z = MPZ.realloc2(nd)
    s = sign(x)
    s == -1 && (x = -x)
    x = unsigned(x)
    size = 0
    limbnbits = sizeof(Limb) << 3
    while nd > 0
        size += 1
        unsafe_store!(z.d, x % Limb, size)
        x >>>= limbnbits
        nd -= limbnbits
    end
    z.size = s*size
    z
end

which is a copy of BigInt(x::Integer) except for replacing the call to ndigits. At least locally, that makes the test failure go away.

Not sure how to properly fix this, though. Options include:

  • Make sure constant-propagation does its duty here.
  • Implement div(::UInt128, ::UInt128) natively instead of calling out to its BigInt version.

@c42f
Copy link
Member

c42f commented Nov 15, 2018

@martinholters Nice! That sounds like part of the puzzle and means the problem started happening with the change at 6fa3e45#diff-c39f3bbf0cb1c2381243b42647d9ff1dR287

Though it doesn't explain the inconsistency of sometimes getting Any and other times UInt128 inferred for rem(::UInt128, UInt128).

@StefanKarpinski
Copy link
Member

The symptoms are mitigated by #30032 but the underlying inference issue remains.

@martinholters
Copy link
Member

Thinking about this again, my analysis above doesn't explain anything. If the arguments during recursion don't change, inference is happy with it. Only if the same method gets called with different argument types, problems may appear. I.e. inference may bail out (infer as Any), but where that happens depends on where the recursion loop was entered, so results may vary. I still suspect this is the behavior we're seeing here, but I have no idea where the offending recursion might sneak in.

@c42f
Copy link
Member

c42f commented Nov 19, 2018

I ran valgrind against the offending sysimg compile step. In one run, the following log occurred

$ valgrind --tool=memcheck --leak-check=full /home/chris/dev/julia-i686/usr/bin/julia -g1 -O0 -C "pentium4" --output-ji /home/chris/dev/julia-i686/usr/lib/julia/sys.ji.tmp  --startup-file=no --warn-overwrite=yes --sysimage /home/chris/dev/julia-i686/usr/lib/julia/corecompiler.ji sysimg.jl
[...]
Dates  ────────── 52.722144 seconds
DelimitedFiles  ─  1.952113 seconds
==1125== Conditional jump or move depends on uninitialised value(s)
==1125==    at 0x4B51B62: ???
==1125==
Random  ───────── 14.559922 seconds
UUIDs  ──────────  0.259599 seconds
Future  ─────────  0.083137 seconds
LinearAlgebra  ──406.806756 seconds
[...]
==1125== Use --track-origins=yes to see where uninitialised values come from

Which is extremely suspicious on the face of it, but repeated reruns with --track-origins=yes failed to reproduce or give any further clues, so perhaps it's a false positive.

@c42f
Copy link
Member

c42f commented Nov 19, 2018

@martinholters I think I've got a repro of this outside of bootstrap. Try the following:

# Force recompilation
julia> Base.eval(:(rem(x::UInt128, y::UInt128) = UInt128(rem(BigInt(x), BigInt(y)))))
rem (generic function with 136 methods)
# Compile directly gives the right result
julia> Base.return_types(rem, (UInt128, UInt128))
1-element Array{Any,1}:
 UInt128
# Now force recompilation, entering inference via `parse`:
julia> Base.eval(:(rem(x::UInt128, y::UInt128) = UInt128(rem(BigInt(x), BigInt(y)))))
julia> parse(UInt128, "1")
julia> Base.return_types(rem, (UInt128, UInt128))
1-element Array{Any,1}:
 Any

I think the reason this is inconsistent from one sysimg compile to the next is that the compilation inside jl_compile_specializations is done in the order of the bindings in the modules. These bindings are stored in a ptrhash hash table, so the ordering of bindings isn't stable as it depends on the addresses which happen to be allocated to the bound objects.

In particular, in test runs I found that rem can be compiled due to at least any of parse, @uint128_str ndigits or Base._base. The first two of these result in the wrong inferred type of Any, while the second two result in UInt128 being inferred.

@c42f
Copy link
Member

c42f commented Nov 19, 2018

Any time there is inconsistency in the handling of cycles during inference we are likely to have this inconsistent behavior due to the inconsistent ordering coming from jl_foreach_reachable_mtable.

If we sorted the list of specializations m before traversing it at

for (i = 0, l = jl_array_len(m); i < l; i++) {
we might at least make the compilation reproducible.

vtjnash added a commit that referenced this issue Nov 20, 2018
Seems to have been missing since the original implementation of IPO constant-prop.
It's impressive how long we got away with just poisoning the cache with `Any`
if there was a self-recursive call.

Fix #29923
@vtjnash
Copy link
Member

vtjnash commented Nov 20, 2018

That's a really great repro, and helped narrow this down quickly (spoiler: it wasn't where I expected). Usually it's enough to apply this diff to spot where inference diverges from my expectations:

diff --git a/base/compiler/abstractinterpretation.jl b/base/compiler/abstractinterpretation.jl
index e30b5496b0..65c1d6a2b0 100644
--- a/base/compiler/abstractinterpretation.jl
+++ b/base/compiler/abstractinterpretation.jl
@@ -300,6 +300,11 @@ function abstract_call_method(method::Method, @nospecialize(sig), sparams::Simpl
         newsig = limit_type_size(sig, comparison, sv.linfo.specTypes, sv.params.TUPLE_COMPLEXITY_LIMIT_DEPTH, spec_len)
 
         if newsig !== sig
+            println()
+            println(sig)
+            println(newsig)
+            print_callstack(sv)
+            println()
             # continue inference, but note that we've limited parameter complexity
             # on this call (to ensure convergence), so that we don't cache this result
             if call_result_unused(sv)

But I should perhaps have attempt to record (or repeat?) this debug session, since this didn't turn up anything, I had to go hunting further. Next I took a look at what was going into the cache:

diff --git a/base/compiler/typeinfer.jl b/base/compiler/typeinfer.jl
index 01fb88b024..b26860d341 100644
--- a/base/compiler/typeinfer.jl
+++ b/base/compiler/typeinfer.jl
@@ -28,6 +27,15 @@ function typeinf(frame::InferenceState)
     # empty!(frames)
     min_valid = frame.min_valid
     max_valid = frame.max_valid
+    if cached && frame.parent !== nothing
+        println()
+        for caller in results
+            println(caller.argtypes, " => ", caller.result)
+            # println(caller.src)
+        end
+        print_callstack(frame)
+        println()
+    end
     if cached || frame.parent !== nothing
         for caller in results
             opt = caller.src

This was suspiciously pointing the finger at something permitting results to go into the cache that were clearly bogus, and had the structure of a self-recursive call on a constant (these are marked [uncached] [limited])—which is supposed to get rejected by inference—getting added to the cache.

If we sorted the list of specializations m

Yes. While we try to make inference fairly predictable and stable(*), sorting the precompile lists there by something stable (such as Method min_world when iterating over bindings) might help with faster isolation of similar problems.

(*) In particular, we want it to have the property that any particular call to inference (e.g. code_typed or Base.return_types) should not return a wider result than expected due to cache effects. We're just fine with it giving unpredictably narrower answers based on stuff that happened to be done in the past, and just we're just fine with it giving entirely unpredictable answers when running recursively, as part of converging inference (up to the limit that it shouldn't widen the final type given as the answer, but that does mean that the set of possible optimizations and inlining afterwards is expected to be unpredictable in general).

c42f added a commit to c42f/julia that referenced this issue Nov 20, 2018
This should help in making ordering-dependent bugs such as JuliaLang#29923 (an
error in caching of inferred methods) deterministic, which will make
them much easier to find in the future.
c42f added a commit that referenced this issue Nov 20, 2018
This should help in making ordering-dependent bugs such as #29923 (an
error in caching of inferred methods) deterministic, which will make
them much easier to find in the future.
@c42f
Copy link
Member

c42f commented Nov 20, 2018

Great! I was hoping someone with knowledge of inference would step in and fix this. Thanks to Martin as well for giving just the hint I needed to repro it.

Yes. While we try to make inference fairly predictable and stable(*), sorting the precompile lists there by something stable (such as Method min_world when iterating over bindings) might help with faster isolation of similar problems.

How about sorting m using the same method as jl_resort_type_cache? Here's a PR for that: #30095

c42f added a commit that referenced this issue Nov 21, 2018
This should help in making ordering-dependent bugs such as #29923 (an
error in caching of inferred methods) deterministic, which will make
them much easier to find in the future.
tkf pushed a commit to tkf/julia that referenced this issue Nov 21, 2018
This should help in making ordering-dependent bugs such as JuliaLang#29923 (an
error in caching of inferred methods) deterministic, which will make
them much easier to find in the future.
vtjnash added a commit that referenced this issue Nov 26, 2018
Seems to have been missing since the original implementation of IPO constant-prop.
It's impressive how long we got away with just poisoning the cache with `Any`
if there was a self-recursive call, but it requires a fairly convoluted
setup to trigger this case.

Fix #29923
vtjnash added a commit that referenced this issue Nov 27, 2018
Seems to have been missing since the original implementation of IPO constant-prop.
It's impressive how long we got away with just poisoning the cache with `Any`
if there was a self-recursive call, but it requires a fairly convoluted
setup to trigger this case.

Fix #29923
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Continuous integration priority This should be addressed urgently system:linux Affects only Linux system:32-bit Affects only 32-bit systems
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants