- 
          
 - 
                Notifications
    
You must be signed in to change notification settings  - Fork 5.7k
 
          [Inference] limit inference timing recording to NativeInterpreter only
          #49391
        
          New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
| 
           Alternatively we can do something like: diff --git a/base/compiler/typeinfer.jl b/base/compiler/typeinfer.jl
index 1eec73d043..eb88a2081e 100644
--- a/base/compiler/typeinfer.jl
+++ b/base/compiler/typeinfer.jl
@@ -205,8 +205,7 @@ __set_measure_typeinf(onoff::Bool) = __measure_typeinf__[] = onoff
 const __measure_typeinf__ = fill(false)
 
 # Wrapper around _typeinf that optionally records the exclusive time for each invocation.
-function typeinf(interp::AbstractInterpreter, frame::InferenceState)
-    interp = switch_from_irinterp(interp)
+function typeinf(interp::NativeInterpreter, frame::InferenceState)
     if __measure_typeinf__[]
         Timings.enter_new_timer(frame)
         v = _typeinf(interp, frame)
@@ -216,6 +215,8 @@ function typeinf(interp::AbstractInterpreter, frame::InferenceState)
         return _typeinf(interp, frame)
     end
 end
+# disable recording timings for external `AbstractInterpreter`s
+typeinf(interp::AbstractInterpreter, frame::InferenceState) = _typeinf(interp, frame)
 
 function finish!(interp::AbstractInterpreter, caller::InferenceResult)
     # If we didn't transform the src for caching, we may have to transform
@@ -242,6 +243,7 @@ function finish!(interp::AbstractInterpreter, caller::InferenceResult)
 end
 
 function _typeinf(interp::AbstractInterpreter, frame::InferenceState)
+    interp = switch_from_irinterp(interp)
     typeinf_nocycle(interp, frame) || return false # frame is now part of a higher cycle
     # with no active ip's, frame is done
     frames = frame.callers_in_cycle?  | 
    
          
 I thought about that, but I actually want to be able to see how much time it took to infer CUDA code.  | 
    
          
 GPUCompiler can implement its own measurement mechanism for that? I'm not opposed to the current approach, but it seems somewhat risky to leave   | 
    
| 
           The major reason would be that it would be great if SnoopCompile could be extended to custom abstract interpreters.  | 
    
          
 I don't know why that would be the case? I would say it is correct? You might have arbitrary interleavings?  | 
    
          
 It's possible that there are type unstable parts within some external   | 
    
          
 Yeah, I agree. But would it be cleaner if external  const gpu_timings = CC.Timings.Timing[]
function CC.typeinf(interp::GPUInterpreter, frame::InferenceState)
    [ push timing into gpu_timings ]
    @invoke CC.typeinf(interp::AbstractInterpreter, frame::InferenceState)
end
macro snoopi_deep_gpu(ex) _snoopi_deep_gpu(ex) end
function _snoopi_deep_gpu(cmd::Expr)
    return quote
        start_gpu_deep_timing()
        try
            $(esc(cmd))
        finally
            stop_gpu_deep_timing()
        end
        CC.Timings.InferenceTimingNode(gpu_timings[1])
    end 
end | 
    
| 
           You would want to do both at the same time though... 
…On Tue, Apr 18, 2023, 11:56 Shuhei Kadowaki ***@***.***> wrote:
 The major reason would be that it would be great if SnoopCompile could be
 extended to custom abstract interpreters.
 For me that's the next frontier in TTFX. Oceananigans has 10+ minutes of
 compilation time.
 Yeah, I agree. But would it be cleaner if external AbstractInterpreter
 uses SnoopCompile utilities like this?
 const gpu_timings = CC.Timings.Timing[]
 function CC.typeinf(interp::GPUInterpreter, frame::InferenceState)
     [ push timing into gpu_timings ]
     @invoke CC.typeinf(interp::AbstractInterpreter, frame::InferenceState)end
 macro snoopi_deep_gpu(ex) _snoopi_deep_gpu(ex) end
 function _snoopi_deep_gpu(cmd::Expr)
     return quote
         start_gpu_deep_timing()
         try
             $(esc(cmd))
         finally
             stop_gpu_deep_timing()
         end
         CC.Timings.InferenceTimingNode(gpu_timings[1])
     end end
 —
 Reply to this email directly, view it on GitHub
 <#49391 (comment)>,
 or unsubscribe
 <https://github.com/notifications/unsubscribe-auth/AABDO2WTCOU2UCQKTAQCSTTXB22R3ANCNFSM6AAAAAAXBT4PGE>
 .
 You are receiving this because you authored the thread.Message ID:
 ***@***.***>
 
 | 
    
c25de89    to
    aff54d5      
    Compare
  
    | 
           Leaving a comment from slack for future reference: Shuhei Kadowaki E.g. julia/base/compiler/typeinfer.jl Lines 165 to 166 in a34261f 
 _timings[end], and currently we may get an inference graph like:
, which seems to be a bit tricky to handle? If we isolate  and  | 
    
Allows consumers to separate native inference from inference coming from a custom AbstractInterpreter. Co-authored-by: Shuhei Kadowaki <[email protected]>
This reverts commit 062fa0a.
aff54d5    to
    c2411ce      
    Compare
  
    NativeInterpreter
      NativeInterpreterNativeInterpreter only
      …nly (#49391) The logic of `Core.Compiler.Timings` assumes that the whole recorded inference graph is constructed by the same interpreter, thus we should limit the inference timing recording to `NativeInterpreter` only. External `AbstractInterpreter` can implement its own recording logic, likely by reusing existing `Core.Compiler.Timings` utilities, in a way that does not interfere with the recording for native compilation pipeline. --------- Co-authored-by: Shuhei Kadowaki <[email protected]> (cherry picked from commit 3db036e)
…reters (#54069) Partially reverts #49391 PrecompileTools uses the timing infrastructure to snoop on the inference process. The reason for #49391 was that this could lead to accidental pollution of the caches with foreign results (JuliaDebug/SnoopCompile.jl#338) After #52233 and especially #53336 we now filter results by cache owner and don't try to cache foreign code using the native pipeline. Motivated by JuliaGPU/GPUCompiler.jl#567 which demonstrated that a foreign code instance would not be cached without PrecompileTools.
…reters (#54069) Partially reverts #49391 PrecompileTools uses the timing infrastructure to snoop on the inference process. The reason for #49391 was that this could lead to accidental pollution of the caches with foreign results (JuliaDebug/SnoopCompile.jl#338) After #52233 and especially #53336 we now filter results by cache owner and don't try to cache foreign code using the native pipeline. Motivated by JuliaGPU/GPUCompiler.jl#567 which demonstrated that a foreign code instance would not be cached without PrecompileTools. (cherry picked from commit c0611e8)
Allows consumers to separate native inference from inference coming
from a custom AbstractInterpreter.
x-ref: JuliaDebug/SnoopCompile.jl#338
I choose
typeof(interp)since the interp may be arbitrarily heavy and we might not want to hold on to them.