consistent 32bit CI error #29923

Keno · 2018-11-04T06:57:35Z

This seems to have started recently

Error During Test at /tmp/julia/share/julia/test/numbers.jl:2329
  Got exception outside of a @test
  return type Int128 does not match inferred return type Any
  Stacktrace:
   [1] error(::String) at ./error.jl:33
   [2] top-level scope at /tmp/julia/share/julia/test/numbers.jl:2372
   [3] top-level scope at /home/travis/build/JuliaLang/julia/usr/share/julia/stdlib/v1.1/Test/src/Test.jl:1083
   [4] top-level scope at /tmp/julia/share/julia/test/numbers.jl:2330
   [5] include at ./boot.jl:317 [inlined]
   [6] include_relative(::Module, ::String) at ./loading.jl:1038
   [7] include at ./sysimg.jl:29 [inlined]
   [8] include(::String) at /tmp/julia/share/julia/test/testdefs.jl:13
   [9] top-level scope at /tmp/julia/share/julia/test/testdefs.jl:22
   [10] top-level scope at /home/travis/build/JuliaLang/julia/usr/share/julia/stdlib/v1.1/Test/src/Test.jl:1083
   [11] top-level scope at /tmp/julia/share/julia/test/testdefs.jl:21
   [12] top-level scope at util.jl:289
   [13] top-level scope at /tmp/julia/share/julia/test/testdefs.jl:19
   [14] eval at ./boot.jl:319 [inlined]
   [15] #runtests#3(::UInt128, ::Function, ::String, ::String, ::Bool) at /tmp/julia/share/julia/test/testdefs.jl:25
   [16] #runtests at ./none:0 [inlined] (repeats 2 times)
   [17] (::getfield(Distributed, Symbol("##112#114")){Distributed.CallMsg{:call_fetch}})() at /home/travis/build/JuliaLang/julia/usr/share/julia/stdlib/v1.1/Distributed/src/process_messages.jl:269
   [18] run_work_thunk(::getfield(Distributed, Symbol("##112#114")){Distributed.CallMsg{:call_fetch}}, ::Bool) at /home/travis/build/JuliaLang/julia/usr/share/julia/stdlib/v1.1/Distributed/src/process_messages.jl:56
   [19] macro expansion at /home/travis/build/JuliaLang/julia/usr/share/julia/stdlib/v1.1/Distributed/src/process_messages.jl:269 [inlined]
   [20] (::getfield(Distributed, Symbol("##111#113"))

e.g. https://travis-ci.org/JuliaLang/julia/jobs/450035986

The text was updated successfully, but these errors were encountered:

JeffBezanson · 2018-11-04T18:44:35Z

Also https://travis-ci.org/JuliaLang/julia/jobs/450350239

c42f · 2018-11-05T10:16:10Z

I just got another of these.

c42f · 2018-11-05T10:49:40Z

This is reproducible for me on an i686 build from a 64 bit ubuntu 18.04 host. The bug reduces to:

julia> Base.return_types(%, (Int8, UInt128))
1-element Array{Any,1}:
 Any

c42f · 2018-11-05T11:14:53Z

Well, it makes absolutely no sense, but reverting e2f7b2f fixes this for me locally.

c42f · 2018-11-05T12:41:17Z

More precisely, reverting only the following line seems to fix the problem, but I don't understand why.

diff --git a/base/array.jl b/base/array.jl
index 9e8da6c384..74823710ab 100644
--- a/base/array.jl
+++ b/base/array.jl
@@ -272,7 +272,6 @@ function copyto!(dest::Array{T}, doffs::Integer, src::Array{T}, soffs::Integer,
         throw(BoundsError())
     end
     unsafe_copyto!(dest, doffs, src, soffs, n)
-    return dest
 end
 
 # Outlining this because otherwise a catastrophic inference slowdown

Edit: having woken my machine up from sleep state, this one line change no longer "fixes" the problem. Which makes sense if there's something nasty and intermittent going on here. Uninitialized memory read or some such thing, perhaps.

c42f · 2018-11-05T23:16:59Z

It looks like my machine has fallen out of whatever state it was in which allowed me to reproduce this.

What I did when reproducing was to build 0999799 with ARCH=i686 (on ubuntu 18.04), which gave the return_types failure as stated. Then to revert various recent commits and measure the bug with Revise until I found a change which "fixed" the problem.

This was yesterday, but the build of 0999799 no longer exhibits the problem today. I'll chalk that up to it being intermittent (I don't think it's just me loosing my mind :-) ).

c42f · 2018-11-08T00:19:52Z

Got another window of reproducibility; it looks like the cached inferred code for rem(::UInt128, ::UInt128) has a return type of Any, which may possibly be the underlying cause though I'm not sure.

Applying the following patch:

diff --git a/base/compiler/typeinfer.jl b/base/compiler/typeinfer.jl
index 01fb88b024..e1912b71da 100644
--- a/base/compiler/typeinfer.jl
+++ b/base/compiler/typeinfer.jl
@@ -594,6 +594,11 @@ function typeinf_type(method::Method, @nospecialize(atypes), sparams::SimpleVect
             inf = code.inferred
             if !isa(inf, CodeInfo) || (inf::CodeInfo).inferred
                 i == 2 && ccall(:jl_typeinf_end, Cvoid, ())
+                println("typeinf_type code = ", code)
+                println("typeinf_type ", method, " code.rettype = ", code.rettype)
+                println("typeinf_type ", code.inferred)
+                println("typeinf_type ", ccall(:jl_uncompress_ast, Any, (Any, Any), method,
+                                               code.inferred::Vector{UInt8}))
                 return code.rettype
             end
         end

I get the following output:

julia> using Revise; Revise.track(Base.Core.Compiler);

julia> Base.return_types(rem, (UInt128, UInt128))
typeinf_type code = rem(UInt128, UInt128)
typeinf_type Base.rem(...) code.rettype = Any
typeinf_type Array{UInt8, (105,)}[0x08, 0x03, 0x00, 0x00, 0x00, 0x23, 0x73, 0x65, 0x6c, 0x66, 0x23, 0x00, 0x78, 0x00, 0x79, 0x00, 0x16, 0x84, 0xb0, 0xea, 0x09, 0x03, 0x3c, 0x11, 0x02, 0x1d, 0x11, 0x00, 0x05, 0x02, 0x00, 0x09, 0x03, 0x3c, 0x11, 0x02, 0x1d, 0x11, 0x00, 0x05, 0x03, 0x00, 0x2d, 0x1d, 0x85, 0x03, 0x01, 0x03, 0x02, 0x2c, 0x1d, 0x1c, 0xf4, 0x03, 0x03, 0x09, 0x01, 0x3e, 0x03, 0x04, 0x39, 0x16, 0x84, 0xb0, 0xea, 0x3a, 0x3a, 0x3a, 0x3a, 0x3a, 0x16, 0x84, 0xac, 0x26, 0x00, 0x2e, 0x2e, 0x20, 0x85, 0x4d, 0x2b, 0x0a, 0x03, 0xab, 0x16, 0x01, 0xb0, 0xf6, 0x00, 0x00, 0x00, 0x00, 0x00, 0x16, 0x01, 0xae, 0xf6, 0x00, 0x00, 0x00, 0x01, 0x01, 0x01, 0x01, 0x01]
typeinf_type Core.CodeInfo(code=Array{Any, (5,)}[
  Expr(:invoke, (::Type{Base.GMP.BigInt})(UInt128), Base.BigInt, Core.SlotNumber(id=2)),
  Expr(:invoke, (::Type{Base.GMP.BigInt})(UInt128), Base.BigInt, Core.SlotNumber(id=3)),
  Expr(:call, Base.rem, SSAValue(1), SSAValue(2)),
  Expr(:call, Base.UInt128, SSAValue(3)),
  Expr(:return, SSAValue(4))], codelocs=Array{Int32, (5,)}[1, 1, 1, 1, 1], method_for_inference_limit_heuristics=nothing, ssavaluetypes=Array{Any, (5,)}[
  Any,
  Any,
  Any,
  Any,
  Any], linetable=Array{Core.LineInfoNode, (1,)}[Core.LineInfoNode(mod=Base, method=:rem, file=Symbol("int.jl"), line=778, inlined_at=0)], ssaflags=Array{UInt8, (5,)}[0x00, 0x00, 0x00, 0x00, 0x00], slotflags=Array{UInt8, (3,)}[0x00, 0x00, 0x00], slotnames=Array{Any, (3,)}[
  Symbol("#self#"),
  :x,
  :y], inferred=true, inlineable=false, propagate_inbounds=false, pure=false)
1-element Array{Any,1}:
 Any

Given these BigInt in here, perhaps this comes down to a weird interaction with the patch in 6fa3e45

c42f · 2018-11-08T00:27:29Z

Given these BigInt in here, perhaps this comes down to a weird interaction with the patch in 6fa3e45

This would also explain why it's not seen on 64 bit linux, as we don't use GMP for supporting UInt128 in that case.

c42f · 2018-11-08T01:09:30Z

Here's another oddity - @code_typed and Base.return_types disagree about rem with UInt128:

julia> @code_typed rem(UInt128(1), UInt128(1))
CodeInfo(
778 1 ─ %1 = invoke Base.BigInt(_2::UInt128)::BigInt                  │ 
    │   %2 = invoke Base.BigInt(_3::UInt128)::BigInt                  │ 
    │   %3 = Base.GMP.MPZ.tdiv_r::typeof(tdiv_r)                      │╻ rem
    │   %4 = invoke %3(%1::BigInt, %2::BigInt)::BigInt                ││
    │   %5 = invoke Base.UInt128(%4::BigInt)::UInt128                 │ 
    └──      return %5                                                │ 
) => UInt128

julia> Base.return_types(rem, (UInt128, UInt128))
1-element Array{Any,1}:
 Any

c42f · 2018-11-08T01:35:44Z

More fun:

$ ./julia

[...]

julia> Base.return_types(rem, (UInt128, UInt128))
1-element Array{Any,1}:
 Any

# Now replace `rem` with exactly the same definition, to cause recompilation: 
julia> Base.eval(:(
           function rem(x::UInt128, y::UInt128)
               return UInt128(rem(BigInt(x), BigInt(y)))
           end
       ))
rem (generic function with 136 methods)

julia> Base.return_types(rem, (UInt128, UInt128))
1-element Array{Any,1}:
 UInt128

So I guess the incorrect return type for rem is being read from the sysimg. Later on we get this incorrect version deserialized, but if we override it with the exact same function definition, the next inference run returns the correct result.

c42f · 2018-11-08T03:52:19Z

The following patch dumps inferred rettype for rem as it's cached in jl_set_method_inferred.

diff --git a/src/gf.c b/src/gf.c
index ae4a1413b7..414a52e369 100644
--- a/src/gf.c
+++ b/src/gf.c
@@ -432,6 +432,13 @@ JL_DLLEXPORT jl_method_instance_t* jl_set_method_inferred(
     li->functionObjectsDecls.functionObject = NULL;
     li->functionObjectsDecls.specFunctionObject = NULL;
     li->rettype = rettype;
+    if (li->def.method->name == jl_symbol("rem")) {
+        jl_printf(JL_STDOUT, "jl_set_method_inferred rem; rettype = ");
+        jl_static_show(JL_STDOUT, li->rettype);
+        jl_printf(JL_STDOUT, "   ");
+        jl_static_show(JL_STDOUT, li->specTypes);
+        jl_printf(JL_STDOUT, "\n");
+    }
     jl_gc_wb(li, rettype);
     li->inferred = inferred;
     jl_gc_wb(li, inferred);

It demonstrates that the 32 bit build sometimes infers the wrong type for rem(UInt128, UInt128). In particular, applying it and calling

make clean &> /dev/null && make &> build.log && grep 'jl_set_method_inferred.*rettype = .*UInt128, UInt128' build.log

sometimes produces

jl_set_method_inferred rem; rettype = Any   Tuple{typeof(Base.rem), UInt128, UInt128}
jl_set_method_inferred rem; rettype = Any   Tuple{typeof(Base.rem), UInt128, UInt128}

and other times produces

jl_set_method_inferred rem; rettype = Any   Tuple{typeof(Base.rem), UInt128, UInt128}
jl_set_method_inferred rem; rettype = UInt128   Tuple{typeof(Base.rem), UInt128, UInt128}

I'm not sure whether this depends on being inside the sysimg build or not.

c42f · 2018-11-08T14:07:32Z

Ok, to reproduce this, you need an i686 build, apply the patch above and to run the sysimg build step several times until you see the uninferred version of rem

jl_set_method_inferred rem; rettype = Any   Tuple{typeof(Base.rem), UInt128, UInt128}

twice in the log.

Another change which might have caused this to appear - #29795

StefanKarpinski · 2018-11-08T14:47:40Z

Nice work tracking this down, @c42f! @Keno, @vtjnash, @JeffBezanson—any idea what's going on?

c42f · 2018-11-08T20:49:45Z

Thanks, it's a head scratcher that's for sure. I caught the following stack trace from inside jl_set_method_inferred, just as it sets the bad Any as the rettype. This is still slightly after the damage is done, but almost at the right location. At a high level it's inside jl_compile_hint, so just before the sysimg build is completed.

jl_set_method_inferred rem; rettype = Any   Tuple{typeof(Base.rem), UInt128, UInt128}
rec_backtrace at /home/tcfoster/src/julia-i686/src/stackwalk.c:94
jlbacktrace2 at /home/tcfoster/src/julia-i686/src/stackwalk.c:536
jl_set_method_inferred at /home/tcfoster/src/julia-i686/src/gf.c:441
cache_result at ./compiler/typeinfer.jl:133
typeinf at ./compiler/typeinfer.jl:61
typeinf_edge at ./compiler/typeinfer.jl:499 [inlined]
abstract_call_method at ./compiler/abstractinterpretation.jl:363
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:85
abstract_call at ./compiler/abstractinterpretation.jl:775
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_apply at ./compiler/abstractinterpretation.jl:518
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call at ./compiler/abstractinterpretation.jl:566
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval_call at ./compiler/abstractinterpretation.jl:804
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval at ./compiler/abstractinterpretation.jl:889
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_local at ./compiler/abstractinterpretation.jl:1134
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1190
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf at ./compiler/typeinfer.jl:15
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_edge at ./compiler/typeinfer.jl:499
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_method at ./compiler/abstractinterpretation.jl:363
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:85
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call at ./compiler/abstractinterpretation.jl:775
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval_call at ./compiler/abstractinterpretation.jl:804
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval at ./compiler/abstractinterpretation.jl:889
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_local at ./compiler/abstractinterpretation.jl:1134
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1190
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf at ./compiler/typeinfer.jl:15
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_edge at ./compiler/typeinfer.jl:499
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_method at ./compiler/abstractinterpretation.jl:363
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:85
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call at ./compiler/abstractinterpretation.jl:775
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval_call at ./compiler/abstractinterpretation.jl:804
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval at ./compiler/abstractinterpretation.jl:889
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_local at ./compiler/abstractinterpretation.jl:1134
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1190
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf at ./compiler/typeinfer.jl:15
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_edge at ./compiler/typeinfer.jl:499
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_method at ./compiler/abstractinterpretation.jl:363
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:85
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call at ./compiler/abstractinterpretation.jl:775
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval_call at ./compiler/abstractinterpretation.jl:804
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval at ./compiler/abstractinterpretation.jl:889
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_local at ./compiler/abstractinterpretation.jl:1134
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1190
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf at ./compiler/typeinfer.jl:15
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_edge at ./compiler/typeinfer.jl:499
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_method at ./compiler/abstractinterpretation.jl:363
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:85
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call at ./compiler/abstractinterpretation.jl:775
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval_call at ./compiler/abstractinterpretation.jl:804
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval at ./compiler/abstractinterpretation.jl:889
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_local at ./compiler/abstractinterpretation.jl:1134
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1190
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf at ./compiler/typeinfer.jl:15
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_edge at ./compiler/typeinfer.jl:499
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_method at ./compiler/abstractinterpretation.jl:363
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:85
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call at ./compiler/abstractinterpretation.jl:775
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval_call at ./compiler/abstractinterpretation.jl:804
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval at ./compiler/abstractinterpretation.jl:889
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_local at ./compiler/abstractinterpretation.jl:1134
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1190
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf at ./compiler/typeinfer.jl:15
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_edge at ./compiler/typeinfer.jl:499
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_method at ./compiler/abstractinterpretation.jl:363
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:85
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call at ./compiler/abstractinterpretation.jl:775
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval_call at ./compiler/abstractinterpretation.jl:804
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval at ./compiler/abstractinterpretation.jl:889
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_local at ./compiler/abstractinterpretation.jl:1134
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1190
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf at ./compiler/typeinfer.jl:15
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_edge at ./compiler/typeinfer.jl:499
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_method at ./compiler/abstractinterpretation.jl:363
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:85
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call at ./compiler/abstractinterpretation.jl:775
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval_call at ./compiler/abstractinterpretation.jl:804
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval at ./compiler/abstractinterpretation.jl:889
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_local at ./compiler/abstractinterpretation.jl:1120
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1190
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf at ./compiler/typeinfer.jl:15
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_method_with_const_args at ./compiler/abstractinterpretation.jl:198
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:107
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call at ./compiler/abstractinterpretation.jl:775
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval_call at ./compiler/abstractinterpretation.jl:804
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval at ./compiler/abstractinterpretation.jl:889
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_local at ./compiler/abstractinterpretation.jl:1134
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1190
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf at ./compiler/typeinfer.jl:15
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_method_with_const_args at ./compiler/abstractinterpretation.jl:198
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:107
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call at ./compiler/abstractinterpretation.jl:775
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval_call at ./compiler/abstractinterpretation.jl:804
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval at ./compiler/abstractinterpretation.jl:889
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_local at ./compiler/abstractinterpretation.jl:1120
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1190
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf at ./compiler/typeinfer.jl:15
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_edge at ./compiler/typeinfer.jl:499
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_method at ./compiler/abstractinterpretation.jl:363
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:85
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call at ./compiler/abstractinterpretation.jl:775
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval_call at ./compiler/abstractinterpretation.jl:804
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval at ./compiler/abstractinterpretation.jl:889
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_local at ./compiler/abstractinterpretation.jl:1134
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1190
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf at ./compiler/typeinfer.jl:15
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_edge at ./compiler/typeinfer.jl:499
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_method at ./compiler/abstractinterpretation.jl:363
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:85
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call at ./compiler/abstractinterpretation.jl:775
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval_call at ./compiler/abstractinterpretation.jl:804
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval at ./compiler/abstractinterpretation.jl:889
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_local at ./compiler/abstractinterpretation.jl:1134
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1190
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf at ./compiler/typeinfer.jl:15
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_ext at ./compiler/typeinfer.jl:574
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_ext at ./compiler/typeinfer.jl:611
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
jl_apply at /home/tcfoster/src/julia-i686/src/julia.h:1571 [inlined]
jl_type_infer at /home/tcfoster/src/julia-i686/src/gf.c:277
jl_compile_hint at /home/tcfoster/src/julia-i686/src/gf.c:1964
jl_compile_specializations at /home/tcfoster/src/julia-i686/src/precompile.c:378
jl_precompile at /home/tcfoster/src/julia-i686/src/precompile.c:388 [inlined]
jl_write_compiler_output at /home/tcfoster/src/julia-i686/src/precompile.c:34
jl_atexit_hook at /home/tcfoster/src/julia-i686/src/init.c:222
main at /home/tcfoster/src/julia-i686/ui/repl.c:222
__libc_start_main at /lib/i386-linux-gnu/libc.so.6 (unknown line)
_start at /home/tcfoster/src/julia-i686/usr/bin/julia (unknown line)

StefanKarpinski · 2018-11-13T14:08:00Z

This is no longer what I'd describe as "intermittent"—it seems to happen quite consistently now.

martinholters · 2018-11-15T08:50:55Z

I knew I remembered something like this. Finally I've found it: #25804 exposed a similar bug. The "solution" was a simple workaround: c002de7. Maybe we can just do the same for rem?

martinholters · 2018-11-15T10:19:12Z

I think I have a handle on this one: On 32bit systems, rem(::UInt128, ::Untt128) and div(::UInt128, ::Untt128) convert to BigInt, have the actual computation done, and convert back to UInt128. However, that leads to the following recursion:
BigInt(::UInt128) → ndigits(::UInt128, base=2) → ndigits0z(::UInt128, 2) → ndigits0zpb(::UInt128, 2) → div(::UInt128, 2) → promotion → div(::UInt128, ::UInt128) → BigInt(::UInt128)
It should be noted that ndigits0zpb has a special case for b==2, so the call-chain at run-time is not recursive, but apparently constant-propagation doesn't resolve this at compile-time.

To verify, I've added

function BigInt(x::UInt128)
    x == 0 && return BigInt(Culong(0))
    nd = sizeof(x)<<3 - leading_zeros(x) # instead of ndigits(x, base=2)
    z = MPZ.realloc2(nd)
    s = sign(x)
    s == -1 && (x = -x)
    x = unsigned(x)
    size = 0
    limbnbits = sizeof(Limb) << 3
    while nd > 0
        size += 1
        unsafe_store!(z.d, x % Limb, size)
        x >>>= limbnbits
        nd -= limbnbits
    end
    z.size = s*size
    z
end

which is a copy of BigInt(x::Integer) except for replacing the call to ndigits. At least locally, that makes the test failure go away.

Not sure how to properly fix this, though. Options include:

Make sure constant-propagation does its duty here.
Implement div(::UInt128, ::UInt128) natively instead of calling out to its BigInt version.

This reverts commit 84c8654.

work-around for #29923

c42f · 2018-11-15T19:39:08Z

@martinholters Nice! That sounds like part of the puzzle and means the problem started happening with the change at 6fa3e45#diff-c39f3bbf0cb1c2381243b42647d9ff1dR287

Though it doesn't explain the inconsistency of sometimes getting Any and other times UInt128 inferred for rem(::UInt128, UInt128).

StefanKarpinski · 2018-11-16T15:17:07Z

The symptoms are mitigated by #30032 but the underlying inference issue remains.

martinholters · 2018-11-19T09:36:05Z

Thinking about this again, my analysis above doesn't explain anything. If the arguments during recursion don't change, inference is happy with it. Only if the same method gets called with different argument types, problems may appear. I.e. inference may bail out (infer as Any), but where that happens depends on where the recursion loop was entered, so results may vary. I still suspect this is the behavior we're seeing here, but I have no idea where the offending recursion might sneak in.

c42f · 2018-11-19T10:24:47Z

I ran valgrind against the offending sysimg compile step. In one run, the following log occurred

$ valgrind --tool=memcheck --leak-check=full /home/chris/dev/julia-i686/usr/bin/julia -g1 -O0 -C "pentium4" --output-ji /home/chris/dev/julia-i686/usr/lib/julia/sys.ji.tmp  --startup-file=no --warn-overwrite=yes --sysimage /home/chris/dev/julia-i686/usr/lib/julia/corecompiler.ji sysimg.jl
[...]
Dates  ────────── 52.722144 seconds
DelimitedFiles  ─  1.952113 seconds
==1125== Conditional jump or move depends on uninitialised value(s)
==1125==    at 0x4B51B62: ???
==1125==
Random  ───────── 14.559922 seconds
UUIDs  ──────────  0.259599 seconds
Future  ─────────  0.083137 seconds
LinearAlgebra  ──406.806756 seconds
[...]
==1125== Use --track-origins=yes to see where uninitialised values come from

Which is extremely suspicious on the face of it, but repeated reruns with --track-origins=yes failed to reproduce or give any further clues, so perhaps it's a false positive.

c42f · 2018-11-19T13:08:48Z

@martinholters I think I've got a repro of this outside of bootstrap. Try the following:

# Force recompilation
julia> Base.eval(:(rem(x::UInt128, y::UInt128) = UInt128(rem(BigInt(x), BigInt(y)))))
rem (generic function with 136 methods)
# Compile directly gives the right result
julia> Base.return_types(rem, (UInt128, UInt128))
1-element Array{Any,1}:
 UInt128
# Now force recompilation, entering inference via `parse`:
julia> Base.eval(:(rem(x::UInt128, y::UInt128) = UInt128(rem(BigInt(x), BigInt(y)))))
julia> parse(UInt128, "1")
julia> Base.return_types(rem, (UInt128, UInt128))
1-element Array{Any,1}:
 Any

I think the reason this is inconsistent from one sysimg compile to the next is that the compilation inside jl_compile_specializations is done in the order of the bindings in the modules. These bindings are stored in a ptrhash hash table, so the ordering of bindings isn't stable as it depends on the addresses which happen to be allocated to the bound objects.

In particular, in test runs I found that rem can be compiled due to at least any of parse, @uint128_str ndigits or Base._base. The first two of these result in the wrong inferred type of Any, while the second two result in UInt128 being inferred.

c42f · 2018-11-19T13:15:00Z

Any time there is inconsistency in the handling of cycles during inference we are likely to have this inconsistent behavior due to the inconsistent ordering coming from jl_foreach_reachable_mtable.

If we sorted the list of specializations m before traversing it at

julia/src/precompile.c

Line 375 in 9a7c79c

for (i = 0, l = jl_array_len(m); i < l; i++) {

we might at least make the compilation reproducible.

Seems to have been missing since the original implementation of IPO constant-prop. It's impressive how long we got away with just poisoning the cache with `Any` if there was a self-recursive call. Fix #29923

vtjnash · 2018-11-20T05:23:54Z

That's a really great repro, and helped narrow this down quickly (spoiler: it wasn't where I expected). Usually it's enough to apply this diff to spot where inference diverges from my expectations:

diff --git a/base/compiler/abstractinterpretation.jl b/base/compiler/abstractinterpretation.jl
index e30b5496b0..65c1d6a2b0 100644
--- a/base/compiler/abstractinterpretation.jl
+++ b/base/compiler/abstractinterpretation.jl
@@ -300,6 +300,11 @@ function abstract_call_method(method::Method, @nospecialize(sig), sparams::Simpl
         newsig = limit_type_size(sig, comparison, sv.linfo.specTypes, sv.params.TUPLE_COMPLEXITY_LIMIT_DEPTH, spec_len)
 
         if newsig !== sig
+            println()
+            println(sig)
+            println(newsig)
+            print_callstack(sv)
+            println()
             # continue inference, but note that we've limited parameter complexity
             # on this call (to ensure convergence), so that we don't cache this result
             if call_result_unused(sv)

But I should perhaps have attempt to record (or repeat?) this debug session, since this didn't turn up anything, I had to go hunting further. Next I took a look at what was going into the cache:

diff --git a/base/compiler/typeinfer.jl b/base/compiler/typeinfer.jl
index 01fb88b024..b26860d341 100644
--- a/base/compiler/typeinfer.jl
+++ b/base/compiler/typeinfer.jl
@@ -28,6 +27,15 @@ function typeinf(frame::InferenceState)
     # empty!(frames)
     min_valid = frame.min_valid
     max_valid = frame.max_valid
+    if cached && frame.parent !== nothing
+        println()
+        for caller in results
+            println(caller.argtypes, " => ", caller.result)
+            # println(caller.src)
+        end
+        print_callstack(frame)
+        println()
+    end
     if cached || frame.parent !== nothing
         for caller in results
             opt = caller.src

This was suspiciously pointing the finger at something permitting results to go into the cache that were clearly bogus, and had the structure of a self-recursive call on a constant (these are marked [uncached] [limited])—which is supposed to get rejected by inference—getting added to the cache.

If we sorted the list of specializations m

Yes. While we try to make inference fairly predictable and stable(*), sorting the precompile lists there by something stable (such as Method min_world when iterating over bindings) might help with faster isolation of similar problems.

(*) In particular, we want it to have the property that any particular call to inference (e.g. code_typed or Base.return_types) should not return a wider result than expected due to cache effects. We're just fine with it giving unpredictably narrower answers based on stuff that happened to be done in the past, and just we're just fine with it giving entirely unpredictable answers when running recursively, as part of converging inference (up to the limit that it shouldn't widen the final type given as the answer, but that does mean that the set of possible optimizations and inlining afterwards is expected to be unpredictable in general).

This should help in making ordering-dependent bugs such as JuliaLang#29923 (an error in caching of inferred methods) deterministic, which will make them much easier to find in the future.

This should help in making ordering-dependent bugs such as #29923 (an error in caching of inferred methods) deterministic, which will make them much easier to find in the future.

c42f · 2018-11-20T08:28:47Z

Great! I was hoping someone with knowledge of inference would step in and fix this. Thanks to Martin as well for giving just the hint I needed to repro it.

Yes. While we try to make inference fairly predictable and stable(*), sorting the precompile lists there by something stable (such as Method min_world when iterating over bindings) might help with faster isolation of similar problems.

How about sorting m using the same method as jl_resort_type_cache? Here's a PR for that: #30095

This should help in making ordering-dependent bugs such as #29923 (an error in caching of inferred methods) deterministic, which will make them much easier to find in the future.

…aLang#30032)

This should help in making ordering-dependent bugs such as JuliaLang#29923 (an error in caching of inferred methods) deterministic, which will make them much easier to find in the future.

Seems to have been missing since the original implementation of IPO constant-prop. It's impressive how long we got away with just poisoning the cache with `Any` if there was a self-recursive call, but it requires a fairly convoluted setup to trigger this case. Fix #29923

Keno added priority This should be addressed urgently ci Continuous integration labels Nov 4, 2018

ararslan added system:linux Affects only Linux system:32-bit Affects only 32-bit systems labels Nov 4, 2018

ararslan mentioned this issue Nov 5, 2018

Tests for WeakRef and missing #29928

Merged

c42f mentioned this issue Nov 8, 2018

Refine exception stack API #29901

Merged

maleadt mentioned this issue Nov 8, 2018

Avoid recursion in print_to_string. #29957

Merged

JeffBezanson mentioned this issue Nov 9, 2018

isdeprecated should not resolve binding #29962

Merged

fredrikekre mentioned this issue Nov 10, 2018

Add Float16 comparisons #29916

Merged

StefanKarpinski mentioned this issue Nov 10, 2018

mkpath should always return the path (fix #29989) #29990

Merged

JeffBezanson mentioned this issue Nov 10, 2018

fix #29955, intersection bugs involving triangular constraints #29986

Merged

StefanKarpinski mentioned this issue Nov 10, 2018

file creation tests: check path existence before & after #29994

Merged

fredrikekre mentioned this issue Nov 13, 2018

Decouple InteractiveUtils from LinearAlgebra #29978

Merged

StefanKarpinski mentioned this issue Nov 13, 2018

Fix backtrace line numbers following macros #30011

Merged

fredrikekre mentioned this issue Nov 14, 2018

fix #29995: latex printing of Markdown.HorizontalRule #30016

Merged

StefanKarpinski mentioned this issue Nov 14, 2018

temp work-around for 32-bit inference failure (#29923) #30032

Merged

StefanKarpinski added a commit that referenced this issue Nov 14, 2018

temp work-around for 32-bit inference failure (#29923)

84c8654

fredrikekre mentioned this issue Nov 14, 2018

Move Markdown docs to the Markdown stdlib #30029

Merged

StefanKarpinski added a commit that referenced this issue Nov 15, 2018

Revert "temp work-around for 32-bit inference failure (#29923)"

bfee93a

This reverts commit 84c8654.

StefanKarpinski added a commit that referenced this issue Nov 15, 2018

add work-around type assertions in 128-bit division ops

be45153

work-around for #29923

martinholters mentioned this issue Nov 15, 2018

Prevent inference issue of BigInt(::UInt128) due to recursion #30050

Closed

StefanKarpinski added a commit that referenced this issue Nov 15, 2018

temp work-around for 32-bit inference failure [#29923] (#30032)

26f1ba5

c42f mentioned this issue Nov 20, 2018

Sort method signatures before precompile to avoid order dependence. #30095

Merged

vtjnash mentioned this issue Nov 20, 2018

inference: add missing limit-cycle poisoning #30098

Merged

tkf pushed a commit to tkf/julia that referenced this issue Nov 21, 2018

temp work-around for 32-bit inference failure [JuliaLang#29923] (Juli…

8815c23

…aLang#30032)

vtjnash closed this as completed in #30098 Nov 27, 2018

c42f mentioned this issue Jan 24, 2019

(::Type)(args...) constructors with Union args mess up method sorting #30794

Closed

c42f mentioned this issue Feb 12, 2019

Nested SVectors not correctly inferred? JuliaArrays/StaticArrays.jl#593

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

consistent 32bit CI error #29923

consistent 32bit CI error #29923

Keno commented Nov 4, 2018

JeffBezanson commented Nov 4, 2018

c42f commented Nov 5, 2018

c42f commented Nov 5, 2018

c42f commented Nov 5, 2018

c42f commented Nov 5, 2018 •

edited

Loading

c42f commented Nov 5, 2018 •

edited

Loading

c42f commented Nov 8, 2018

c42f commented Nov 8, 2018

c42f commented Nov 8, 2018

c42f commented Nov 8, 2018

c42f commented Nov 8, 2018

c42f commented Nov 8, 2018

StefanKarpinski commented Nov 8, 2018

c42f commented Nov 8, 2018

StefanKarpinski commented Nov 13, 2018

martinholters commented Nov 15, 2018

martinholters commented Nov 15, 2018

c42f commented Nov 15, 2018

StefanKarpinski commented Nov 16, 2018

martinholters commented Nov 19, 2018

c42f commented Nov 19, 2018

c42f commented Nov 19, 2018

c42f commented Nov 19, 2018

vtjnash commented Nov 20, 2018

c42f commented Nov 20, 2018

consistent 32bit CI error #29923

consistent 32bit CI error #29923

Comments

Keno commented Nov 4, 2018

JeffBezanson commented Nov 4, 2018

c42f commented Nov 5, 2018

c42f commented Nov 5, 2018

c42f commented Nov 5, 2018

c42f commented Nov 5, 2018 • edited Loading

c42f commented Nov 5, 2018 • edited Loading

c42f commented Nov 8, 2018

c42f commented Nov 8, 2018

c42f commented Nov 8, 2018

c42f commented Nov 8, 2018

c42f commented Nov 8, 2018

c42f commented Nov 8, 2018

StefanKarpinski commented Nov 8, 2018

c42f commented Nov 8, 2018

StefanKarpinski commented Nov 13, 2018

martinholters commented Nov 15, 2018

martinholters commented Nov 15, 2018

c42f commented Nov 15, 2018

StefanKarpinski commented Nov 16, 2018

martinholters commented Nov 19, 2018

c42f commented Nov 19, 2018

c42f commented Nov 19, 2018

c42f commented Nov 19, 2018

vtjnash commented Nov 20, 2018

c42f commented Nov 20, 2018

c42f commented Nov 5, 2018 •

edited

Loading

c42f commented Nov 5, 2018 •

edited

Loading