Skip to content

axpy! for nested base types (reshapedarray/adjoint/view) #1696

@maartenvd

Description

@maartenvd

Describe the bug

I'm porting some code to make use of cuda, and we make extensive use of nested base types (ReshapedArrays over possibly Adjointed Views into CuArrays). When I subsequently call axpy!, this results in slow scalar fallbacks.

To reproduce

julia> a = CuArray(rand(10,10));

julia> test = reshape(view(a,2:3,:),10,2);

julia> typeof(test)
Base.ReshapedArray{Float64, 2, SubArray{Float64, 2, CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, Tuple{UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}, Tuple{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}}}

julia> axpy!(true,test,copy(test))
┌ Warning: Performing scalar indexing on task Task (runnable) @0x00007f4ed6614010.
│ Invocation of getindex resulted in scalar indexing of a GPU array.
│ This is typically caused by calling an iterating implementation of a method.
│ Such implementations *do not* execute on the GPU, but very slowly on the CPU,
│ and therefore are only permitted from the REPL for prototyping purposes.
│ If you did intend to index this array, annotate the caller with @allowscalar.
└ @ GPUArraysCore ~/.julia/packages/GPUArraysCore/lojQM/src/GPUArraysCore.jl:90

Expected behavior

No scalar indexing

Version info

Details on Julia:

julia> versioninfo()
Julia Version 1.8.3
Commit 0434deb161 (2022-11-14 20:14 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, skylake)
  Threads: 1 on 12 virtual cores

Details on CUDA:

julia> CUDA.versioninfo()
CUDA toolkit 11.7, artifact installation
NVIDIA driver 525.60.13, for CUDA 12.0
CUDA driver 12.0

Libraries: 
- CUBLAS: 11.10.1
- CURAND: 10.2.10
- CUFFT: 10.7.2
- CUSOLVER: 11.3.5
- CUSPARSE: 11.7.3
- CUPTI: 17.0.0
- NVML: 12.0.0+525.60.13
- CUDNN: 8.30.2 (for CUDA 11.5.0)
- CUTENSOR: 1.4.0 (for CUDA 11.5.0)

Toolchain:
- Julia: 1.8.3
- LLVM: 13.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

1 device:
  0: NVIDIA GeForce GTX 1650 Ti (sm_75, 2.948 GiB / 4.000 GiB available)


Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions