Skip to content

Conversation

@vchuravy
Copy link
Member

@maleadt as discussed earlier today, I didn't quite finish it so the CuArray changes are at https://github.com/JuliaGPU/CuArrays.jl/tree/vc/things

@SimonDanisch thoughts on the backend addition? The issue is that with wrapper types like Adjoint/Transpose/SubArray there is no easy dispatch target anymore so something trait-like might work quite well.

@andreasnoack would it make sense to introduce Adjoint{T, A} <: Wrapper{T, A<:AbstractArray{T}} <: AbstractArray{T} into the base hierarchy? Right now we need to enumerate and special treat all the wrapper types.

@ssz66666
Copy link
Contributor

I wonder if this approach can be adapted to allow arbitrarily nested wrapper types. Currently it doesn't:

julia> using CuArrays, Adapt

julia> CuArrays.allowscalar(false);

julia> xs = Adapt.adapt(CuArray{Complex{Float32}},[1.0+0.0im]);

julia> ys = transpose(xs)
1×1 LinearAlgebra.Transpose{Complex{Float64},CuArray{Complex{Float64},1}}:
 1.0 + 0.0im

julia> ys .= 1
[ Info: getfield(GPUArrays, Symbol("##19#20"))(), (LinearAlgebra.Transpose{Complex{Float64},CuArray{Complex{Float64},1}}, Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},typeof(identity),Tuple{Int64}})
1×1 LinearAlgebra.Transpose{Complex{Float64},CuArray{Complex{Float64},1}}:
 1.0 + 0.0im


julia> zs = adjoint(ys)
1×1 LinearAlgebra.Adjoint{Complex{Float64},LinearAlgebra.Transpose{Complex{Float64},CuArray{Complex{Float64},1}}}:
Error showing value of type LinearAlgebra.Adjoint{Complex{Float64},LinearAlgebra.Transpose{Complex{Float64},CuArray{Complex{Float64},1}}}:
ERROR: scalar getindex is disabled

julia> zs .= 1
ERROR: scalar setindex! is disabled

@vchuravy
Copy link
Member Author

I am not sure we can handle the recursive case without injecting an abstraction into the type-hierarchy. But everything gets fuzzy once you start thinking about transpose(DArray(CuArray)).

src/mapreduce.jl Outdated
Base.all(A::GPUArray{Bool}) = mapreduce(identity, &, true, A)
Base.count(pred, A::GPUArray) = Int(mapreduce(pred, +, 0, A))

Base.any(f::Function, A::GPUArray) = mapreduce(f, |, false, A)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed that Base.any and Base.all have defined slightly different semantics from the mapreduce-based implementation here when I was trying to incorporate the changes into #145 :

  1. They are short-circuiting evaluations in Base. This changes semantics when the predicate has side-effects. Maybe we can just make it very clear in documentation that the GPU implementation is not short-circuiting?
  2. Base implementations allow missing values, and deal with them by following three-valued logic. Should we follow the Base behaviour here? It might end up with something like:
Base.any(pred, A::GPUArray) = let t =
    begin
        mapreduce(x -> let v = pred(x)
            ismissing(v) ? (false, true) : (v && true, false) end, # enforce v is `Bool` or `missing`
        ((v1,m1),(v2,m2)) -> (v1 || v2, m1 || m2),
        A; init = (false, false))
    end
    t[1] ? true : (t[2] ? missing : false)
end

@ssz66666
Copy link
Contributor

ssz66666 commented Aug 25, 2018

I was thinking about marking wrapped arrays as GPU compatible by using another wrapper struct, so that we can get a dispatch target.

Essentially this is using a concrete wrapper type (called OnGPU in the example below) to pretend to be the proposed WrappedArray abstract type, and override all the methods defined in AbstractArray interface as well as hijacking the constructors of wrappers we want to override to rewire all methods to its child array. Thus we can kind of inject an abstraction into the type hierarchy without modifying base code. In addition I used phantom type in OnGPU to remember the GPUBackend type.

It will be necessary to re-implement all Base. methods that directly access its content, like what is done for GPUArray already. Then we might hijack methods that create wrappers (like LinearAlgebra.Transpose) over GPUArray/ wrapped GPU compatible array, to unwrap the marker, apply the wrapper, and then remark it as GPU compatible.

We will need to special case for all these wrappers we want to support vectorisation, but I think that is okay, because only very few wrappers, like SubArray or Transpose are really needed and actually easily GPU vectorisable without scalar indexing.

I've got a minimal example to demonstrate the idea. I'll really appreciate your opinion as I think it would be really useful to get some simple wrappers like SubArray, LinearAlgebra.Transpose or SkipMissing to work with GPUArrays.

abstract type MyInterface{T} end # like AbstractArray

abstract type MyGPUStruct{T} <: MyInterface{T} end # like GPUArray

struct MyStruct{T} <: MyGPUStruct{T} # like CuArray 
    name::String
    x::T
end

struct MyWrapper{T,AT <: MyInterface{T}} <: MyInterface{T} # like LinearAlgebra.Transpose
    parent::AT
end

struct MyWrapper2{T,AT <: MyInterface{T}} <: MyInterface{T} # any Wrapper that we want it to fall back to base
    parent2::AT
end

# methods in Base to create wrappers, like view, transpose and adjoint
wrap1(x::MyInterface) = MyWrapper(x)
wrap2(x::MyInterface) = MyWrapper2(x)

# all methods in `Base` to be overridden,
# like what is already done with GPUArray itself
function get_name(::MyInterface) 
    error("not implemented")
end

#think of get_name as broadcast, fill!, copyto! or show, etc
# we want to override it for OnGPU later

get_name(o::MyStruct) = begin println("MyStruct"); o.name end
get_name(o::MyWrapper) = begin println("Wrapper"); get_name(o.parent) end
get_name(o::MyWrapper2) = begin println("Wrapper2"); get_name(o.parent2) end

#methods that simply fall back to base
get_x(o::MyStruct) = o.x
get_x(o::MyWrapper) = get_x(o.parent)
get_x(o::MyWrapper) =  get_x(o.parent2)

# our marker wrapper that marks an MyInterface like thing
# to be GPU compatible. It only propagates to the next layer of
# wrapper if only the next layer of wrapping 
# is also GPU compatible 
# type param BE remembers the backend type to be able to
# dispatch to specific GPU kernels
# type param AT reveals the type of underlying array
# so that you can also dispatch on that
struct OnGPU{T,BE<:MyGPUStruct{T},AT<:MyInterface{T}} <: MyInterface{T}
    content::AT
end

# need to override methods that create wrappers we care about,
# e.g. Base.view, transpose, etc.
wrap1(x::BE) where {T,BE <: MyGPUStruct{T}} = begin
    y = MyWrapper(x)
    OnGPU{T,BE,typeof(y)}(y)
end
wrap1(x::OnGPU{T,BE,AT}) where {T,BE,AT} = begin
    y = MyWrapper(x.content)
    OnGPU{T,BE,typeof(y)}(y)
end

# capture vector operation methods like fill!
# for broadcast to work we need to fix 
# BroadcastStyle(::Type{OnGPU{T,<:MyGPUStruct,<:MyInterface}})
get_name(x::OnGPU) = get_name_gpu(x.content)

# methods that fall back to base
get_x(x::OnGPU) = get_x(x.content)

# GPU methods which are generic enough to be handled in
# GPUArrays rather than in backend
# in this case, simply unwrapping MyWrapper and pass it onto
# get_name_gpu(::MyGPUStruct)
get_name_gpu(x::MyWrapper) = get_name_gpu(x.parent)

# default behaviour should be falling back to base
get_name_gpu(x::MyGPUStruct) = error("fall back to base")

# hopefully implemented in GPU backend, don't know how difficult
# it would be to handle recursively nested wrappers that wrap
# a GPUArray(CuArray).
get_name_gpu(x::MyStruct) = get_name(x)

x = MyStruct("x",1) # uses gpu method
y = wrap1(x) # dispatch to gpu method
z = wrap1(y) # dispatch to gpu method
w = wrap2(z) # fall back to base for non-compatible wrappers 
v = wrap1(w) # fall back to base again

get_name(x)
get_name(y)
get_name(z)
get_name(w)
get_name(v)

@maleadt
Copy link
Member

maleadt commented Aug 28, 2018

@ssz66666 Interesting design! But wouldn't it break use of Base/user code that expects wrappers? eg. foo(::MyWrapper) isn't callable with an OnGPU{..., ..., MyWrapper}. I'm not sure how common that is though, or for that matter how common nested wrappers are (necessitating this complexity).

For now, I've worked a little on this PR, at least improving the use of non-nested Transpose/Adjoint/SubArray. Let's merge this first and figure out a better wrapper design after that?

@ssz66666
Copy link
Contributor

ssz66666 commented Aug 28, 2018

Agreed with @maleadt , solving the whole wrapper support design problem seems out of scope of this PR.

Also, my proposal does break functions expecting a wrapper directly. An obvious use case is LinearAlgebra.transpose(::Transpose). Guess we need to find a way to expose this information to compiler without messing too much with the type hierarchy. Or perhaps just sticking to a simpler design and supporting the simplest yet common cases, at least for now.

@vchuravy
Copy link
Member Author

When we tag we should bump the major version due to the change in API

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants