Skip to content

Understanding which methods get cached in *.ji files (please help) #31457

@timholy

Description

@timholy

I'm trying to reduce the lag with the new Revise, which loads a lot more code now that it depends on JuliaInterpreter. A long-standing problem is that despite precompile statements, certain methods need to be re-inferred. I'm trying to understand the rules of the selection and whether they can be modified to cache more comprehensively.

I am a little skeptical that I'm inserting debug statements in the correct places (please help @vtjnash!), but here goes:

diff --git a/src/dump.c b/src/dump.c
index 84a84032f4..5b978b80d7 100644
--- a/src/dump.c
+++ b/src/dump.c
@@ -390,6 +390,8 @@ static void jl_serialize_datatype(jl_serializer_state *s, jl_datatype_t *dt) JL_
 
 static void jl_serialize_module(jl_serializer_state *s, jl_module_t *m)
 {
+    jl_printf(JL_STDOUT, "Serializing module ");
+    jl_(m);
     write_uint8(s->s, TAG_MODULE);
     jl_serialize_value(s, m->name);
     size_t i;
@@ -2749,14 +2751,31 @@ JL_DLLEXPORT int jl_save_incremental(const char *fname, jl_array_t *worklist)
     jl_array_t *edges = jl_alloc_vec_any(0);
 
     size_t i;
+    jl_printf(JL_STDOUT, "Worklist:\n");
+    for (i = 0; i < jl_array_len(worklist); i++)
+        jl_(jl_array_ptr_ref(worklist, i));
     size_t len = jl_array_len(mod_array);
     for (i = 0; i < len; i++) {
         jl_module_t *m = (jl_module_t*)jl_array_ptr_ref(mod_array, i);
         assert(jl_is_module(m));
         jl_collect_lambdas_from_mod(lambdas, m);
     }
+    jl_printf(JL_STDOUT, "Collected lambdas:\n");
+    for (i = 0; i < jl_array_len(lambdas); i++) {
+        jl_value_t *l = jl_array_ptr_ref(lambdas, i);
+        jl_(l);
+        if ((jl_datatype_t*)jl_typeof(l) == jl_method_type)
+            jl_(((jl_method_t*)l)->sig);
+    }
 
     jl_collect_backedges(edges);
+    jl_printf(JL_STDOUT, "Collected edges:\n");
+    for (i = 0; i < jl_array_len(edges); i++) {
+        jl_value_t *l = jl_array_ptr_ref(edges, i);
+        jl_(l);
+        if ((jl_datatype_t*)jl_typeof(l) == jl_method_type)
+            jl_(((jl_method_t*)l)->sig);
+    }
 
     jl_serializer_state s = {
         &f, MODE_MODULE,
diff --git a/src/gf.c b/src/gf.c
index f225ad7bb4..1ab79fcfd1 100644
--- a/src/gf.c
+++ b/src/gf.c
@@ -1966,6 +1966,8 @@ JL_DLLEXPORT int jl_compile_hint(jl_tupletype_t *types)
     if (li == NULL)
         return 0;
     if (jl_generating_output()) {
+        jl_printf(JL_STDOUT, "compile hint: ");
+        jl_(li);
         jl_code_info_t *src = NULL;
         // If we are saving ji files (e.g. package pre-compilation or intermediate sysimg build steps),

Test case (mimicking certain features of Revise, like the use of setindex! methods from OrderedCollections):

module Dummy

using OrderedCollections

export f

struct RelocatableExpr
    ex::Expr
end
const ExprsSigs = OrderedDict{RelocatableExpr,Union{Nothing,Vector{Any}}}

f(x::RelocatableExpr) = 1
# Base.setindex!(rex::RelocatableExpr, val, idx) = error("oops")

if ccall(:jl_generating_output, Cint, ()) == 1
    precompile(Tuple{typeof(f), RelocatableExpr})
    precompile(Tuple{typeof(setindex!), ExprsSigs, Nothing, RelocatableExpr})
end

end

Output:

$ julia-master --startup-file=no -e 'push!(LOAD_PATH, "/tmp/pkgs"); Base.compilecache(Base.PkgId("Dummy"))'
compile hint: f(Dummy.RelocatableExpr)
compile hint: setindex!(OrderedCollections.OrderedDict{Dummy.RelocatableExpr, Union{Nothing, Array{Any, 1}}}, Nothing, Dummy.RelocatableExpr)
Worklist:
Dummy
Collected lambdas:
Dummy.Type(...)
Tuple{Type{Dummy.RelocatableExpr}, Expr}
nothing
Dummy.Type(...)
Tuple{Type{Dummy.RelocatableExpr}, Any}
nothing
Dummy.Type(...)
Tuple{Type{Dummy.RelocatableExpr}, Expr}
nothing
Dummy.Type(...)
Tuple{Type{Dummy.RelocatableExpr}, Any}
nothing
Collected edges:
Serializing module Dummy
Serializing module OrderedCollections

Now let's see what actually gets re-inferred after loading this package. This uses #31444 and this branch of SnoopCompile:

julia> push!(LOAD_PATH, "/tmp/pkgs")
4-element Array{String,1}:
 "@"        
 "@v#.#"    
 "@stdlib"  
 "/tmp/pkgs"

julia> using Dummy, SnoopCompile

julia> rex = Dummy.RelocatableExpr(:(export Foo))
Dummy.RelocatableExpr(:(export Foo))

julia> dict = Dummy.ExprsSigs();

julia> tinfo = @snoopi begin
           Dummy.f(rex)
           dict[rex] = nothing
       end
1-element Array{Tuple{Float64,Core.MethodInstance},1}:
 (0.045538902282714844, MethodInstance for setindex!(::OrderedCollections.OrderedDict{Dummy.RelocatableExpr,Union{Nothing, Array{Any,1}}}, ::Nothing, ::Dummy.RelocatableExpr))

So clearly my debug statements are failing to pick on the fact that f is getting compiled. (Where am I missing a step?) But note that setindex! is getting re-inferred despite the precompile statement. Interestingly, if I uncomment that bogus setindex! method in Dummy, then I see setindex! appearing in the lambdas but the particular MethodInstance still needs to be re-inferred.

Collectively, inference of various setindex! methods alone contribute nearly 0.2s to Revise's startup lag. That triples the startup time of Julia itself.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions