-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Description
Okay, we have a standalone isolated MRE for the hang on master!
This reliably hangs _on master (but not on julia 1.10). So I'm breaking this out from #51603, into a separate issue, since this is something separate.
I don't think we could have gotten this without @vtjnash walking us through some very gnarly hackery to find which function was stuck in inference (including reading some pointers off of registers and some manual pointer arithmetic and memory reads).
One of my takeaways here is that it would be nice to be able to set some kind of "debug mode" flag to have julia log which function it's inferring before it starts and stops inference / optimization, so that we could have found where it was stuck much more easily. Once we found that, isolating an MRE wasn't too hard.
@aviatesk and @vtjnash: Please try to reproduce the hang from this, and see if you can work out the issue from there!
Once you find the culprit, can you also give a time estimate to fix it? I do still think that if it will take more than a couple days, we should start by reverting the culprits so that main can be unbroken until we land the fix.
const Iterator = Any
abstract type Shape end
struct ShapeUnion <: Shape
args::Vector{Shape}
end
shape_null(::Type{Shape}) = SHAPE_NULL
const SHAPE_NULL = ShapeUnion(Shape[])
const CANCELLED = Ref(false)
throw_if_cancelled() = if CANCELLED[] throw(ErrorException("cancelled")) end
@noinline Base.@nospecializeinfer function _union_vec_no_union(args)
return args
end
# WEIRDLY, this also reproduces if shape_disjuncts is not defined!
function shape_disjuncts end
shape_disjuncts(s::ShapeUnion) = s.args
shape_disjuncts(s::Shape) = Shape[s]
# ^ (You can try commenting out the above three lines to produce another hang as well).
function shape_union(::Type{Shape}, args::Iterator)
return _union(args)
end
function _union(args::Iterator)
if any(arg -> arg isa ShapeUnion, args)
# Call the entry point rather than `_union_vec_no_union` because we are uncertain
# about the size and because there might be some optimization opportunity.
return shape_union(Shape, (disj for arg in args for disj in shape_disjuncts(arg)))
end
if !(args isa Vector)
args = collect(Shape, args)
# respect _union_vec_no_union's assumption
isempty(args) && return shape_null(Shape)
length(args) == 1 && return only(args)
end
# If this is a big union, check for cancellation
length(args) > 100 && throw_if_cancelled()
return _union_vec_no_union(args)
end
# Reproduction:
#=
julia> code_typed(
_union,
(Vector{Shape},),
)
=#
julia> VERSION
v"1.11.0-DEV.638"
julia> include("inference_hang_repro.jl")
_union (generic function with 1 method)
julia> code_typed(
_union,
(Vector{Shape},),
)
# it is hanging here....
Originally posted by @NHDaly in #51603 (comment)
Pausing it after long enough, gives this stack trace:
julia> code_typed(
_union,
(Vector{Shape},),
)
^C
ERROR: InterruptException:
Stacktrace:
[1] getindex(compact::Core.Compiler.IncrementalCompact, ssa::Core.SSAValue)
@ Core.Compiler ./compiler/ssair/ir.jl:701
[2] simple_walk(compact::Core.Compiler.IncrementalCompact, defssa::Any, callback::Core.Compiler.var"#476#477")
@ Core.Compiler ./compiler/ssair/passes.jl:202
[3] simple_walk
@ Core.Compiler ./compiler/ssair/passes.jl:188 [inlined]
[4] lifted_value(compact::Core.Compiler.IncrementalCompact, old_node_ssa::Any, old_value::Any, lifted_philikes::Vector{…}, lifted_leaves::Core.Compiler.IdDict{…}, reverse_mapping::Core.Compiler.IdDict{…})
@ Core.Compiler ./compiler/ssair/passes.jl:623
[5] perform_lifting!(compact::Core.Compiler.IncrementalCompact, visited_philikes::Vector{…}, cache_key::Any, result_t::Any, lifted_leaves::Core.Compiler.IdDict{…}, stmt_val::Any, lazydomtree::Core.Compiler.LazyGenericDomtree{…})
@ Core.Compiler ./compiler/ssair/passes.jl:731
[6] sroa_pass!(ir::Core.Compiler.IRCode, inlining::Core.Compiler.InliningState{Core.Compiler.NativeInterpreter})
@ Core.Compiler ./compiler/ssair/passes.jl:1170
[7] run_passes_ipo_safe(ci::Core.CodeInfo, sv::Core.Compiler.OptimizationState{…}, caller::Core.Compiler.InferenceResult, optimize_until::Nothing)
@ Core.Compiler ./compiler/optimize.jl:797
[8] run_passes_ipo_safe
@ Core.Compiler ./compiler/optimize.jl:812 [inlined]
[9] optimize(interp::Core.Compiler.NativeInterpreter, opt::Core.Compiler.OptimizationState{…}, caller::Core.Compiler.InferenceResult)
@ Core.Compiler ./compiler/optimize.jl:786
[10] _typeinf(interp::Core.Compiler.NativeInterpreter, frame::Core.Compiler.InferenceState)
@ Core.Compiler ./compiler/typeinfer.jl:265
[11] typeinf(interp::Core.Compiler.NativeInterpreter, frame::Core.Compiler.InferenceState)
@ Core.Compiler ./compiler/typeinfer.jl:216
[12]
@ Core.Compiler ./compiler/typeinfer.jl:863
[13]
@ Core.Compiler ./compiler/abstractinterpretation.jl:617
[14]
@ Core.Compiler ./compiler/abstractinterpretation.jl:89
[15]
@ Core.Compiler ./compiler/abstractinterpretation.jl:2080