Skip to content

Conversation

@JeffBezanson
Copy link
Member

This is sufficient to get finalizers working. Before merging this though I propose generalizing it by adding a new builtin calledge (or something) that we'd use like so:

function finalizer(@nospecialize(f), @nospecialize(o))
    _check_mutable(o)
    Core.finalizer(f, o)
    calledge(Tuple{typeof(f), typeof(o)})
    return o
end

function Core.Task(@nospecialize(f), reserved_stack::Int=0)
    calledge(Tuple{typeof(f)})
    Core._Task(f, reserved_stack, ThreadSynchronizer())
end

which says that if a call to finalizer is reachable, then so is f(o) etc. The implementation would be basically identical to what's here currently.

@JeffBezanson JeffBezanson added the trimming Issues with trimming functionality or PR's relevant to its performance/functionality label Apr 4, 2025
@vtjnash vtjnash marked this pull request as draft April 4, 2025 20:07
@topolarity
Copy link
Member

adding a new builtin calledge (or something) that we'd use like so

I was thinking perhaps Core._predeclare_call(foo, a, b)

@topolarity topolarity force-pushed the ct/enqueue-dynamic-builtins branch from 01c2a07 to fc4b732 Compare April 11, 2025 13:59
Comment on lines 1333 to 1427
if ftyp === typeof(Core.finalizer) && length(stmt.args) == 3
finalizer_fn = argextype(stmt.args[2], ci, sptypes)
object = argextype(stmt.args[3], ci, sptypes)
atype = argtypes_to_type(Any[finalizer_fn, object])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we instead use the existing CallInfo pieces? E.g. We already have a FinalizerInfo

struct FinalizerInfo <: CallInfo
.

I am thinking something like:

indirect(::CallInfo) = false
indirect(::FinalizerInfo) = true

if indirect(callinfo)
    enqueue(workqueue, callinfo.call)
end

and then we could add TaskInfo and so forth.

One note is

add_edges_impl(::Vector{Any}, ::FinalizerInfo) = nothing

Maybe we "just" need to change that to actually add the edge?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the optimizer destroys the CallInfo so we unfortunately can't forward this edge from inference in the usual way

@vchuravy
Copy link
Member

I am still not convinced of the utility of a new builtin like predeclare_call, that feels unnecessary since inference ought to have all the information already.

If we truly need it, could we merge it into precompile?

@topolarity
Copy link
Member

I am still not convinced of the utility of a new builtin like predeclare_call, that feels unnecessary since inference ought to have all the information already.

I'm not super pleased about it either - but I also don't know of an alternative for atexit-like patterns:

const atexit_handlers = Callable[]
push!(atexit_handlers, f) # we know the identity of `f` on the `push!`

run_atexit() = for handler in atexit_handlers
    handler() # we know longer know the identity of the handlers here
end

This is pretty challenging, and also very common in practice. Even if we add this "virtual edge" to the runtime (via predeclare_call or similar), it is not tracked properly if you happen to serialize an object that needs this edge tracked

(that happens to not be the case for atexit since those are cleared, but other type-erased callbacks like LazyString could easily be carried over from pre-compilation)

If we truly need it, could we merge it into precompile?

Perhaps. That's what I called it in the first prototype, but @JeffBezanson has been keeping the entrypoint stuff separate for now, since there's little trade-off to declaring a precompile but potentially quite a lot of cost for declaring an entrypoint (or a "virtual edge") since it bloats --trim output and grows the body of static-dispatch-only code

@topolarity topolarity force-pushed the ct/enqueue-dynamic-builtins branch from 7ecc4f6 to 0e29daa Compare April 16, 2025 01:06
@vchuravy
Copy link
Member

So the problem is two-fold?

  1. The expression of a virtual edge for inference sake?
  2. Turning said virtual edge into an entry edge for trimming?

My expectations was that we could just turn these missing edges in inference into real edges there, e.g. teach it to look at the Task constructor and treat it as if it is a call, and that ought to add it to its real edge list? And that one could use the precompile built-in to express such a virtual edge.

For cases like atexit, the problem is more that we have two "separate" inference entries, and their is indeed no information flow possible. I agree that in that case we need to mark these functions as entrypoints, but that seems independent for me from how we handle these calls when they are visible in inference?

@vtjnash
Copy link
Member

vtjnash commented Apr 16, 2025

My expectations was that we could just turn these missing edges in inference into real edges there, e.g. teach it to look at the Task constructor and treat it as if it is a call, and that ought to add it to its real edge list? And that one could use the precompile built-in to express such a virtual edge.

We could do something more like you describe, but it seemed likely to end up being more complicated than is useful. We don't actually want to compile most "real edges" here, we only want to compile "invoke edges" which are often derived from inlined or de-virtualized copies instead. Those edges also run under a different interpreter than the parent (NativeInterp @ current-world, never the inherited one), so it is inconvenient to have to pass around multiple interpreters inside of each NativeInterp in the off-chance that someone is running a trim build and will want to recover that information later.

Since this code is trim-relevant, we already know the code to precisely compute the call signature is already required (we need to check that every dispatch is covered or is a builtin function not on the disallow-list), so checking that this is one of a set of specific functions is already visible to the code here.

The tricky case is ones like atexit, where we do not easily have the tools to correctly express that it should expect the possibility of a runtime dispatch later for the lifetime of the object it was based upon (which might have come from either compile or runtime or even earlier in the sysimg creation). Inside a single process, this would be adequately modeled already by OpaqueClosure (fixed world hidden edge) or struct CFunction (dynamic world hidden edge), but neither of those survive serialization at present.

@topolarity topolarity marked this pull request as ready for review April 16, 2025 20:38
@topolarity
Copy link
Member

@JeffBezanson this should be ready to go from my end

I'm happy to remove the Core._predeclare_all builtin for now so that we can tackle use case that separately - I'll defer to others to make that call.

AFAIU, that's only immediately useful for Task anyway, since we have no way to use this except for call-sites that Inference cannot see.

@topolarity topolarity changed the title WIP: support finalizers in trimming Support finalizers in trimming Apr 16, 2025
@topolarity topolarity changed the title Support finalizers in trimming trimming: Support finalizers Apr 16, 2025
@topolarity topolarity added the backport 1.12 Change should be backported to release-1.12 label Apr 16, 2025
@topolarity topolarity force-pushed the ct/enqueue-dynamic-builtins branch from 7ba5c33 to d6c982d Compare April 17, 2025 00:50
@KristofferC KristofferC mentioned this pull request Apr 22, 2025
51 tasks
t = @_gc_preserve_begin a
p = unsafe_convert(Ptr{Cvoid}, a)
T = eltype(a)
memset(p, 0x0, (sizeof(T) * length(a)) % UInt)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an odd place to put this optimization. Is it really that helpful? If so, should it be added to fill!? Or we could add a proper zero! function.

Copy link
Member Author

@JeffBezanson JeffBezanson Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see now this was for bootstrapping. Why isn't fill! available here though; isn't array.jl loaded first?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fill! implementation we need here is from multidimensional.jl since it's for a Memory:

julia> @which fill!(m, 0)
fill!(A::AbstractArray{T}, x) where T
     @ Base multidimensional.jl:1141

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. Seems to me that should be in abstractarray.jl.

@JeffBezanson
Copy link
Member Author

If we're going to have predeclare_call (and I think we should) then we should use it to handle finalizer, Task, and atexit. Is there a compelling reason to make finalizers special but not other cases?

@vchuravy
Copy link
Member

Those edges also run under a different interpreter than the parent

That's only true for fianlizer/atexit that will run with invoke latest, but not Task which inherits the world?

Since this code is trim-relevant,

The reason I am pushing back is that I don't really like "poluting" the IR for trim, especially when the information is right next to it. The only case where that isn't the case is atexit, where we need some way of registering it.

@JeffBezanson
Copy link
Member Author

OK I'm kind of sympathetic to the "polluted IR" argument. Having to manually add each case to the compiler when it could just be one line of code as needed seems so unfortunate though. Are we sure there won't be more cases?

I guess with atexit the problem is the call site for the exit handlers. There we need a different feature, which I think has been referred to as "unsafe_call", AKA a "trust me bro" call site. For that it seems having something in the IR is unavoidable?

@KristofferC KristofferC mentioned this pull request Apr 29, 2025
53 tasks
@topolarity topolarity force-pushed the ct/enqueue-dynamic-builtins branch from d6c982d to 8bf1bca Compare April 30, 2025 19:55
topolarity and others added 5 commits April 30, 2025 15:56
This adds the changes required to guess the MethodInstance that the
runtime will eventually invoke for this call, enqueue it for codegen,
and then verify its presence in the verifier.
Otherwise this will match `ftyp === Builtin`, which is pretty silly
since that could just as well be any other built-in.
@topolarity topolarity force-pushed the ct/enqueue-dynamic-builtins branch from 8bf1bca to ce56baa Compare April 30, 2025 19:56
@topolarity
Copy link
Member

I've decided to split off the Core._predeclare_call changes into a separate PR, so that we can land the finalizer support now

This should be ready to merge once CI passes.

@topolarity topolarity added the merge me PR is reviewed. Merge when all tests are passing label Apr 30, 2025
@topolarity topolarity force-pushed the ct/enqueue-dynamic-builtins branch from 51dfef3 to 0578395 Compare May 1, 2025 17:58
@topolarity topolarity merged commit 75ba594 into master May 2, 2025
5 of 7 checks passed
@topolarity topolarity deleted the ct/enqueue-dynamic-builtins branch May 2, 2025 00:34
@vtjnash
Copy link
Member

vtjnash commented May 2, 2025

Can you add a mention of this to NEWS (or rather, expand upon the trim entry in HISTORY) when you make the other followups also?

@topolarity topolarity removed the merge me PR is reviewed. Merge when all tests are passing label May 2, 2025
topolarity added a commit that referenced this pull request May 6, 2025
Should cover all the post-review comments from
#58014

~~Also disables the eager compilation of finalizers unless `--trim` is
enabled, but I'm happy to re-enable that.~~ edit: re-enabled
maleadt added a commit to JuliaGPU/GPUCompiler.jl that referenced this pull request May 8, 2025
@maleadt maleadt mentioned this pull request May 8, 2025
7 tasks
@KristofferC KristofferC mentioned this pull request May 9, 2025
58 tasks
maleadt added a commit to JuliaGPU/GPUCompiler.jl that referenced this pull request May 14, 2025
@KristofferC KristofferC mentioned this pull request Jun 6, 2025
60 tasks
@topolarity topolarity removed the backport 1.12 Change should be backported to release-1.12 label Jun 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

trimming Issues with trimming functionality or PR's relevant to its performance/functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants