Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type inference optimize hang on master: generator stuck perform_lifting / simple_walk #51694

Closed
NHDaly opened this issue Oct 13, 2023 · 8 comments · Fixed by #51845
Closed

Type inference optimize hang on master: generator stuck perform_lifting / simple_walk #51694

NHDaly opened this issue Oct 13, 2023 · 8 comments · Fixed by #51845
Labels
bug Indicates an unexpected problem or unintended behavior compiler:inference Type inference

Comments

@NHDaly
Copy link
Member

NHDaly commented Oct 13, 2023

Okay, we have a standalone isolated MRE for the hang on master!

This reliably hangs _on master (but not on julia 1.10). So I'm breaking this out from #51603, into a separate issue, since this is something separate.


I don't think we could have gotten this without @vtjnash walking us through some very gnarly hackery to find which function was stuck in inference (including reading some pointers off of registers and some manual pointer arithmetic and memory reads).

One of my takeaways here is that it would be nice to be able to set some kind of "debug mode" flag to have julia log which function it's inferring before it starts and stops inference / optimization, so that we could have found where it was stuck much more easily. Once we found that, isolating an MRE wasn't too hard.

@aviatesk and @vtjnash: Please try to reproduce the hang from this, and see if you can work out the issue from there!

Once you find the culprit, can you also give a time estimate to fix it? I do still think that if it will take more than a couple days, we should start by reverting the culprits so that main can be unbroken until we land the fix.

const Iterator = Any

abstract type Shape end

struct ShapeUnion <: Shape
    args::Vector{Shape}
end
shape_null(::Type{Shape}) = SHAPE_NULL
const SHAPE_NULL = ShapeUnion(Shape[])

const CANCELLED = Ref(false)
throw_if_cancelled() = if CANCELLED[] throw(ErrorException("cancelled")) end

@noinline Base.@nospecializeinfer function _union_vec_no_union(args)
    return args
end

# WEIRDLY, this also reproduces if shape_disjuncts is not defined!
function shape_disjuncts end
shape_disjuncts(s::ShapeUnion) = s.args
shape_disjuncts(s::Shape) = Shape[s]
# ^ (You can try commenting out the above three lines to produce another hang as well).

function shape_union(::Type{Shape}, args::Iterator)
    return _union(args)
end
function _union(args::Iterator)
    if any(arg -> arg isa ShapeUnion, args)
        # Call the entry point rather than `_union_vec_no_union` because we are uncertain
        # about the size and because there might be some optimization opportunity.
        return shape_union(Shape, (disj for arg in args for disj in shape_disjuncts(arg)))
    end

    if !(args isa Vector)
        args = collect(Shape, args)
        # respect _union_vec_no_union's assumption
        isempty(args) && return shape_null(Shape)
        length(args) == 1 && return only(args)
    end

    # If this is a big union, check for cancellation
    length(args) > 100 && throw_if_cancelled()
    return _union_vec_no_union(args)
end

# Reproduction:
#=
julia> code_typed(
           _union,
           (Vector{Shape},),
       )
=#
julia> VERSION
v"1.11.0-DEV.638"

julia> include("inference_hang_repro.jl")
_union (generic function with 1 method)

julia> code_typed(
           _union,
           (Vector{Shape},),
       )
# it is hanging here....

Originally posted by @NHDaly in #51603 (comment)

Pausing it after long enough, gives this stack trace:

julia> code_typed(
           _union,
           (Vector{Shape},),
       )
^C
ERROR: InterruptException:
Stacktrace:
   [1] getindex(compact::Core.Compiler.IncrementalCompact, ssa::Core.SSAValue)
     @ Core.Compiler ./compiler/ssair/ir.jl:701
   [2] simple_walk(compact::Core.Compiler.IncrementalCompact, defssa::Any, callback::Core.Compiler.var"#476#477")
     @ Core.Compiler ./compiler/ssair/passes.jl:202
   [3] simple_walk
     @ Core.Compiler ./compiler/ssair/passes.jl:188 [inlined]
   [4] lifted_value(compact::Core.Compiler.IncrementalCompact, old_node_ssa::Any, old_value::Any, lifted_philikes::Vector{…}, lifted_leaves::Core.Compiler.IdDict{…}, reverse_mapping::Core.Compiler.IdDict{…})
     @ Core.Compiler ./compiler/ssair/passes.jl:623
   [5] perform_lifting!(compact::Core.Compiler.IncrementalCompact, visited_philikes::Vector{…}, cache_key::Any, result_t::Any, lifted_leaves::Core.Compiler.IdDict{…}, stmt_val::Any, lazydomtree::Core.Compiler.LazyGenericDomtree{…})
     @ Core.Compiler ./compiler/ssair/passes.jl:731
   [6] sroa_pass!(ir::Core.Compiler.IRCode, inlining::Core.Compiler.InliningState{Core.Compiler.NativeInterpreter})
     @ Core.Compiler ./compiler/ssair/passes.jl:1170
   [7] run_passes_ipo_safe(ci::Core.CodeInfo, sv::Core.Compiler.OptimizationState{…}, caller::Core.Compiler.InferenceResult, optimize_until::Nothing)
     @ Core.Compiler ./compiler/optimize.jl:797
   [8] run_passes_ipo_safe
     @ Core.Compiler ./compiler/optimize.jl:812 [inlined]
   [9] optimize(interp::Core.Compiler.NativeInterpreter, opt::Core.Compiler.OptimizationState{…}, caller::Core.Compiler.InferenceResult)
     @ Core.Compiler ./compiler/optimize.jl:786
  [10] _typeinf(interp::Core.Compiler.NativeInterpreter, frame::Core.Compiler.InferenceState)
     @ Core.Compiler ./compiler/typeinfer.jl:265
  [11] typeinf(interp::Core.Compiler.NativeInterpreter, frame::Core.Compiler.InferenceState)
     @ Core.Compiler ./compiler/typeinfer.jl:216
  [12]
     @ Core.Compiler ./compiler/typeinfer.jl:863
  [13]
     @ Core.Compiler ./compiler/abstractinterpretation.jl:617
  [14]
     @ Core.Compiler ./compiler/abstractinterpretation.jl:89
  [15]
     @ Core.Compiler ./compiler/abstractinterpretation.jl:2080
@NHDaly NHDaly added bug Indicates an unexpected problem or unintended behavior compiler:inference Type inference labels Oct 13, 2023
@aviatesk
Copy link
Member

I infiltrated into sroa_pass! and observed the IR causing the hang has 370339 statements, with 342416 being PhiNode, which appear very similar. Here is a snapshot of the IR:

...%276775 = φ (#2857 => %275943)::Int64                                                 ││││││││││││              %276776 = φ (#2857 => %275674)::Int64                                                 ││││││││││││              %276777 = φ (#2857 => %275741)::Int64                                                 ││││││││││││              %276778 = φ (#2857 => %275742)::Int64                                                 ││││││││││││              %276779 = φ (#2857 => %275745)::Vector{Shape}                                         ││││││││││││              %276780 = φ (#2857 => %275746)::Vector{Shape}                                         ││││││││││││              %276781 = φ (#2857 => %275750)::Vector{Shape}                                         ││││││││││││              %276782 = φ (#2857 => %275756)::Vector{Shape}                                         ││││││││││││              %276783 = φ (#2857 => %275748)::Vector{Shape}                                         ││││││││││││              %276784 = φ (#2857 => %275751)::Vector{Shape}                                         ││││││││││││              %276785 = φ (#2857 => %275752)::Vector{Shape}                                         ││││││││││││              %276786 = φ (#2857 => %275754)::Vector{Shape}                                         ││││││││││││              %276787 = φ (#2857 => %275747)::Vector{Shape}                                         ││││││││││││              %276788 = φ (#2857 => %275749)::Vector{Shape}                                         ││││││││││││              %276789 = φ (#2857 => %275757)::Vector{Shape}                                         ││││││││││││              %276790 = φ (#2857 => %275755)::Vector{Shape}                                         ││││││││││││              %276791 = φ (#2857 => %275753)::Vector{Shape}                                         ││││││││││││              %276792 = φ (#2857 => %275759)::Vector{Shape}                                         ││││││││││││              %276793 = φ (#2857 => %275758)::Vector{Shape}                                         ││││││││││││              %276794 = φ (#2857 => %275760)::Vector{Shape}                                         ││││││││││││              %276795 = φ (#2857 => %275761)::Base.Generator{Vector{Shape}, typeof(identity)}       ││││││││││││             
...

From what I've observed, walk_to_defs seems to operate correctly but struggles to process this massinve IR.

@NHDaly
Copy link
Member Author

NHDaly commented Oct 18, 2023

So is the issue in whatever is producing the massive IR in the first place? Something earlier in the set of passes?

@aviatesk
Copy link
Member

Indeed, but the issue might lie with sroa_pass! in a callee context, given that the IR I inspected is post-inlining.

@aviatesk
Copy link
Member

Reduced this further a bit:

abstract type Shape end

struct ShapeUnion <: Shape
    args::Vector{Shape}
end

# WEIRDLY, this also reproduces if shape_disjuncts is not defined!
function shape_disjuncts end
shape_disjuncts(s::ShapeUnion) = s.args
shape_disjuncts(s::Shape) = Shape[s]
# ^ (You can try commenting out the above three lines to produce another hang as well).

function shape_union(::Type{Shape}, args)
    return _union(args)
end

function _union(args)
    if any(arg -> arg isa ShapeUnion, args)
        # Call the entry point rather than `_union_vec_no_union` because we are uncertain
        # about the size and because there might be some optimization opportunity.
        return shape_union(Shape, (disj for arg in args for disj in shape_disjuncts(arg)))
    end
    return args
end

# Reproduction:
#=
julia> code_typed(
           _union,
           (Vector{Shape},),
       )
=#

@aviatesk
Copy link
Member

It doesn't look like it's just an issue with the optimization pass. I've seen the problem pop up even with optimization totally switched off. I'm thinking it's probably because of the infinite abstract interpretation rather than optimizer producing massive IRsーthe massive IR might be a result of inlining of endless inference cycle:

include("test/compiler/newinterp.jl")
@newinterp CompilerDebugger
CC.may_optimize(::CompilerDebugger) = false
code_typed(_union, (Vector{Shape},); interp=CompilerDebugger()) # issue persists

@vtjnash
Copy link
Member

vtjnash commented Oct 24, 2023

Ah, that make sense. Flatten is alphabetically before Generator, so it results in endless inference. That seems like an easy fix now

vtjnash added a commit that referenced this issue Oct 24, 2023
It seems this case has already been fixed by other improvements, so we
no longer need this hack, which is now causing problems.

Fixes #51694
@aviatesk
Copy link
Member

Nice!

aviatesk pushed a commit that referenced this issue Oct 25, 2023
It seems this case has already been fixed by other improvements, so we
no longer need this hack, which is now causing problems.

Fixes #51694
aviatesk pushed a commit that referenced this issue Oct 25, 2023
It seems this case has already been fixed by other improvements, so we
no longer need this hack, which is now causing problems.

Fixes #51694
@NHDaly
Copy link
Member Author

NHDaly commented Oct 27, 2023

Awesome! Thanks guys! Great to see the fix land here :)
Thanks again for digging into it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior compiler:inference Type inference
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants