Skip to content
This repository was archived by the owner on May 27, 2021. It is now read-only.

Conversation

@maleadt
Copy link
Member

@maleadt maleadt commented Feb 27, 2020

Fixes #459 by setting ci.edges (from JuliaLang/julia#32237) AND adding a fake call, both seem required to fool Julia.

This regresses launch performance, which I plan to look into soon.

@maleadt
Copy link
Member Author

maleadt commented Feb 27, 2020

Valentin pointed out we shouldn't need to set the edges here, and the fake call should suffice because unlike Cassette we actually call the child methods. (Conversely, shouldn't setting the edge explicitly on the outer method work too and not require a fake call?)

Anyway, something fishy is happening, because not adding edges and only doing the call breaks our tests, or more succinctly:

using CUDAnative, CuArrays, CUDAdrv

arr = CuArray(zeros(Int))

doit(ptr) = (unsafe_store!(ptr, 0); @cuprintln(1))

function kernel(ptr)
    doit(ptr)
    return
end

@cuda kernel(pointer(arr))

doit(ptr) = (unsafe_store!(ptr, 0); @cuprintln(2))

@cuda kernel(pointer(arr))

synchronize()

Maybe something's wrong with the fake call? Replacing it with the following (for the above example) makes invalidation work without the edge:

function fake_call(f, tt)
    opaque_false[] || return
    args = [Ref{CUDAnative.DevicePtr{Int64,CUDAnative.AS.Generic}}()[]]
    f(args...)
end

However, starting to generalize that breaks as soon as I do something in the generated version of this method:

@generated function fake_call(f, tt)
    adding_this_statement_breaks_invalidation = [:(Ref{$T}()[]) for T in tt.parameters[1].parameters]
    quote
        opaque_false[] || return
        args = [Ref{CUDAnative.DevicePtr{Int64,CUDAnative.AS.Generic}}()[]]
        f(args...)
    end
end

Oh this also depends on the unsafe_store in the kernel methods, just doing a cuprintln correctly recompiles.

What is going on ...

@maleadt
Copy link
Member Author

maleadt commented Feb 27, 2020

Calling in the cavalry... @vtjnash or @Keno could you guys shed some light on this? In summary, I'm trying to get method invalidation working. This doesn't automatically work since we never call the kernel function, but take its IR and execute that. The compilation and invalidation is handled by cufunction(t, tt), which in the current design is a generator that boths emit a code info with ci.edges set a la Cassette, and adds a fake call to f in its IR.

# HACK: mechanism to generate calls that are not executed, but ensure method invalidation
const opaque_false = Ref(false)
function fake_call(f)
opaque_false[] || return
f(Ref{Any}()[]...)
end
# actual compilation
function cufunction_slow(f, tt, spec; name=nothing, kwargs...)
start = time_ns()
# generate a fake call to ensure we get recompiled upon method invalidation
fake_call(f)

new_ci.edges = MethodInstance[mi]

Either one of those doesn't suffice to get proper invalidation/recompilation, but the behavior is weird and depends in strange ways on the kernel function and on the code in fake_call in ways I can't explain (see previous post). Shouldn't either the fake call or setting the edges get this working? Or am I doing something undefined?

@maleadt maleadt force-pushed the tb/265 branch 2 times, most recently from dbf5646 to 178ce3f Compare March 3, 2020 07:53
@maleadt maleadt merged commit c335366 into master Mar 3, 2020
@bors bors bot deleted the tb/265 branch March 3, 2020 12:53
Copy link
Member

@vchuravy vchuravy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Everything Just Works"(tm)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

265-fix is only partial

3 participants