Fix method invalidation #581

maleadt · 2020-02-27T09:51:07Z

Fixes #459 by setting ci.edges (from JuliaLang/julia#32237) AND adding a fake call, both seem required to fool Julia.

This regresses launch performance, which I plan to look into soon.

maleadt · 2020-02-27T16:22:40Z

Valentin pointed out we shouldn't need to set the edges here, and the fake call should suffice because unlike Cassette we actually call the child methods. (Conversely, shouldn't setting the edge explicitly on the outer method work too and not require a fake call?)

Anyway, something fishy is happening, because not adding edges and only doing the call breaks our tests, or more succinctly:

using CUDAnative, CuArrays, CUDAdrv

arr = CuArray(zeros(Int))

doit(ptr) = (unsafe_store!(ptr, 0); @cuprintln(1))

function kernel(ptr)
    doit(ptr)
    return
end

@cuda kernel(pointer(arr))

doit(ptr) = (unsafe_store!(ptr, 0); @cuprintln(2))

@cuda kernel(pointer(arr))

synchronize()

Maybe something's wrong with the fake call? Replacing it with the following (for the above example) makes invalidation work without the edge:

function fake_call(f, tt)
    opaque_false[] || return
    args = [Ref{CUDAnative.DevicePtr{Int64,CUDAnative.AS.Generic}}()[]]
    f(args...)
end

However, starting to generalize that breaks as soon as I do something in the generated version of this method:

@generated function fake_call(f, tt)
    adding_this_statement_breaks_invalidation = [:(Ref{$T}()[]) for T in tt.parameters[1].parameters]
    quote
        opaque_false[] || return
        args = [Ref{CUDAnative.DevicePtr{Int64,CUDAnative.AS.Generic}}()[]]
        f(args...)
    end
end

Oh this also depends on the unsafe_store in the kernel methods, just doing a cuprintln correctly recompiles.

What is going on ...

maleadt · 2020-02-27T16:50:48Z

Calling in the cavalry... @vtjnash or @Keno could you guys shed some light on this? In summary, I'm trying to get method invalidation working. This doesn't automatically work since we never call the kernel function, but take its IR and execute that. The compilation and invalidation is handled by cufunction(t, tt), which in the current design is a generator that boths emit a code info with ci.edges set a la Cassette, and adds a fake call to f in its IR.

CUDAnative.jl/src/execution.jl

Lines 318 to 330 in b5aaeb6

    
           # HACK: mechanism to generate calls that are not executed, but ensure method invalidation 
        
           const opaque_false = Ref(false) 
        
           function fake_call(f) 
        
               opaque_false[] || return 
        
               f(Ref{Any}()[]...) 
        
           end 
        
           # actual compilation 
        
           function cufunction_slow(f, tt, spec; name=nothing, kwargs...) 
        
               start = time_ns() 
        
               # generate a fake call to ensure we get recompiled upon method invalidation 
        
               fake_call(f)

CUDAnative.jl/src/execution.jl

Line 453 in b5aaeb6

new_ci.edges = MethodInstance[mi]

Either one of those doesn't suffice to get proper invalidation/recompilation, but the behavior is weird and depends in strange ways on the kernel function and on the code in fake_call in ways I can't explain (see previous post). Shouldn't either the fake call or setting the edges get this working? Or am I doing something undefined?

src/execution.jl

vchuravy

"Everything Just Works"(tm)

maleadt added the Julia support label Feb 27, 2020

maleadt requested a review from vchuravy February 27, 2020 10:13

maleadt force-pushed the tb/265 branch from 388942e to b5aaeb6 Compare February 27, 2020 10:38

vchuravy reviewed Feb 27, 2020

View reviewed changes

src/execution.jl Outdated Show resolved Hide resolved

src/execution.jl Outdated Show resolved Hide resolved

src/execution.jl Show resolved Hide resolved

src/execution.jl Show resolved Hide resolved

src/execution.jl Show resolved Hide resolved

src/execution.jl Show resolved Hide resolved

maleadt force-pushed the tb/265 branch 2 times, most recently from dbf5646 to 178ce3f Compare March 3, 2020 07:53

maleadt added 2 commits March 3, 2020 09:43

Use a custom generator function for control on back-edges.

a6fc211

Simplify at-cuda implementation.

4201c4e

maleadt force-pushed the tb/265 branch from 178ce3f to 4201c4e Compare March 3, 2020 08:43

Leave one line table entry.

c16685f

maleadt mentioned this pull request Mar 3, 2020

Avoid OOB access of the line table during codegen. JuliaLang/julia#34973

Merged

maleadt merged commit c335366 into master Mar 3, 2020

bors bot deleted the tb/265 branch March 3, 2020 12:53

vchuravy reviewed Mar 3, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix method invalidation #581

Fix method invalidation #581

Uh oh!

maleadt commented Feb 27, 2020 •

edited

Loading

Uh oh!

maleadt commented Feb 27, 2020 •

edited

Loading

Uh oh!

maleadt commented Feb 27, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vchuravy left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Fix method invalidation #581

Fix method invalidation #581

Uh oh!

Conversation

maleadt commented Feb 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maleadt commented Feb 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maleadt commented Feb 27, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vchuravy left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

maleadt commented Feb 27, 2020 •

edited

Loading

maleadt commented Feb 27, 2020 •

edited

Loading