-
Notifications
You must be signed in to change notification settings - Fork 256
Closed
Description
Describe the bug
I'm porting some code to make use of cuda, and we make extensive use of nested base types (ReshapedArrays over possibly Adjointed Views into CuArrays). When I subsequently call axpy!, this results in slow scalar fallbacks.
To reproduce
julia> a = CuArray(rand(10,10));
julia> test = reshape(view(a,2:3,:),10,2);
julia> typeof(test)
Base.ReshapedArray{Float64, 2, SubArray{Float64, 2, CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, Tuple{UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}, Tuple{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}}}
julia> axpy!(true,test,copy(test))
┌ Warning: Performing scalar indexing on task Task (runnable) @0x00007f4ed6614010.
│ Invocation of getindex resulted in scalar indexing of a GPU array.
│ This is typically caused by calling an iterating implementation of a method.
│ Such implementations *do not* execute on the GPU, but very slowly on the CPU,
│ and therefore are only permitted from the REPL for prototyping purposes.
│ If you did intend to index this array, annotate the caller with @allowscalar.
└ @ GPUArraysCore ~/.julia/packages/GPUArraysCore/lojQM/src/GPUArraysCore.jl:90Expected behavior
No scalar indexing
Version info
Details on Julia:
julia> versioninfo()
Julia Version 1.8.3
Commit 0434deb161 (2022-11-14 20:14 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 12 × Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, skylake)
Threads: 1 on 12 virtual cores
Details on CUDA:
julia> CUDA.versioninfo()
CUDA toolkit 11.7, artifact installation
NVIDIA driver 525.60.13, for CUDA 12.0
CUDA driver 12.0
Libraries:
- CUBLAS: 11.10.1
- CURAND: 10.2.10
- CUFFT: 10.7.2
- CUSOLVER: 11.3.5
- CUSPARSE: 11.7.3
- CUPTI: 17.0.0
- NVML: 12.0.0+525.60.13
- CUDNN: 8.30.2 (for CUDA 11.5.0)
- CUTENSOR: 1.4.0 (for CUDA 11.5.0)
Toolchain:
- Julia: 1.8.3
- LLVM: 13.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86
1 device:
0: NVIDIA GeForce GTX 1650 Ti (sm_75, 2.948 GiB / 4.000 GiB available)
Metadata
Metadata
Assignees
Labels
No labels