Skip to content

[CUSPARSE] missing broadcasts for multiplication #1666

@CarloLucibello

Description

@CarloLucibello

In the following 4 examples of sparse matrix and dense vector broadcasted multiplications we have 2 errors and 2 wrong outputs (dense matrix instead of sparse)

julia> using SparseArrays, CUDA

julia> CUDA.allowscalar(false)

julia> A = sparse([1. 0 2; 0 3 4; 0 5 6]) |> cu
3×3 CUDA.CUSPARSE.CuSparseMatrixCSC{Float32, Int32} with 6 stored entries:
 1.0      2.0
     3.0  4.0
     5.0  6.0

julia> x = [2.,3,4] |> cu
3-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
 2.0
 3.0
 4.0

julia> A .* x
ERROR: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] assertscalar(op::String)
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/lojQM/src/GPUArraysCore.jl:87
  [3] getindex(xs::CuArray{Int32, 1, CUDA.Mem.DeviceBuffer}, I::Int64)
    @ GPUArrays ~/.julia/packages/GPUArrays/fqD8z/src/host/indexing.jl:9
  [4] getindex(A::CUDA.CUSPARSE.CuSparseMatrixCSC{Float32, Int32}, i0::Int64, i1::Int64)
    @ CUDA.CUSPARSE ~/.julia/dev/CUDA/lib/cusparse/array.jl:311
  [5] _getindex
    @ ./abstractarray.jl:1291 [inlined]
  [6] getindex
    @ ./abstractarray.jl:1241 [inlined]
  [7] _broadcast_getindex
    @ ./broadcast.jl:623 [inlined]
  [8] _getindex
    @ ./broadcast.jl:666 [inlined]
  [9] _broadcast_getindex
    @ ./broadcast.jl:642 [inlined]
 [10] getindex
    @ ./broadcast.jl:597 [inlined]
 [11] macro expansion
    @ ./broadcast.jl:961 [inlined]
 [12] macro expansion
    @ ./simdloop.jl:77 [inlined]
 [13] copyto!
    @ ./broadcast.jl:960 [inlined]
 [14] copyto!
    @ ./broadcast.jl:913 [inlined]
 [15] copy
    @ ./broadcast.jl:885 [inlined]
 [16] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.ArrayConflict, Nothing, typeof(*), Tuple{CUDA.CUSPARSE.CuSparseMatrixCSC{Float32, Int32}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}})
    @ Base.Broadcast ./broadcast.jl:860
 [17] top-level scope
    @ REPL[23]:1
 [18] top-level scope
    @ ~/.julia/dev/CUDA/src/initialization.jl:155

julia> x .* A
ERROR: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] assertscalar(op::String)
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/lojQM/src/GPUArraysCore.jl:87
  [3] getindex
    @ ~/.julia/packages/GPUArrays/fqD8z/src/host/indexing.jl:9 [inlined]
  [4] getindex
    @ ~/.julia/packages/GPUArrays/fqD8z/src/host/indexing.jl:30 [inlined]
  [5] _broadcast_getindex
    @ ./broadcast.jl:636 [inlined]
  [6] _getindex
    @ ./broadcast.jl:666 [inlined]
  [7] _broadcast_getindex
    @ ./broadcast.jl:642 [inlined]
  [8] getindex
    @ ./broadcast.jl:597 [inlined]
  [9] macro expansion
    @ ./broadcast.jl:961 [inlined]
 [10] macro expansion
    @ ./simdloop.jl:77 [inlined]
 [11] copyto!
    @ ./broadcast.jl:960 [inlined]
 [12] copyto!
    @ ./broadcast.jl:913 [inlined]
 [13] copy
    @ ./broadcast.jl:885 [inlined]
 [14] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.ArrayConflict, Nothing, typeof(*), Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CUDA.CUSPARSE.CuSparseMatrixCSC{Float32, Int32}}})
    @ Base.Broadcast ./broadcast.jl:860
 [15] top-level scope
    @ REPL[26]:1
 [16] top-level scope
    @ ~/.julia/dev/CUDA/src/initialization.jl:155


julia> A .* x' # should give a CuSparseMatrix instead
3×3 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
 2.0   0.0   8.0
 0.0   9.0  16.0
 0.0  15.0  24.0

julia> x' .* A # should give a CuSparseMatrix instead
3×3 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
 2.0   0.0   8.0
 0.0   9.0  16.0
 0.0  15.0  24.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    cuda librariesStuff about CUDA library wrappers.enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions