-
-
Notifications
You must be signed in to change notification settings - Fork 35
Closed
Labels
Description
I'm seeing surprisingly low performance for mul!(W, X', V) when X is a SparseMatrixCSC with Float64 entries (X=sprand(2504, 100000, 0.05)) and W and V are dense matrices with Float32 entries. This operation takes about an order of magnitude longer than the same operation when W and V are dense matrices with Float64 entries. However, if X has Bool entries (X=sprand(Bool, 2504, 100000, 0.05)) I don't see any performance difference between Float64 and Float32 entries for V and W.
> using SparseArrays, LinearAlgebra
# Float64 X, V and W
> X = sprand(2504, 100000, 0.05); V = randn(2504, 3); W = zeros(100000, 3);
## transposed X
> @time mul!(W, X', V);
0.082900 seconds (80.11 k allocations: 4.399 MiB)
> @time mul!(W, X', V);
0.033305 seconds (1 allocation: 48 bytes)
## non-transposed X
> @time mul!(V, X, W);
0.122455 seconds (46.03 k allocations: 2.414 MiB)
> @time mul!(V, X, W);
0.088595 seconds
# Float64 X, Float32 V and W
> X = sprand(2504, 100000, 0.05); V = randn(Float32, 2504, 3); W = zeros(Float32, 100000, 3);
## transposed X
> @time mul!(W, X', V);
0.369262 seconds (77.55 k allocations: 4.190 MiB)
> @time mul!(W, X', V);
0.324316 seconds (1 allocation: 48 bytes) # about 10x slower than the same operation with Float64 entries
## non-transposed X
> @time mul!(V, X, W);
0.123769 seconds (46.30 k allocations: 2.425 MiB)
> @time mul!(V, X, W);
0.087341 seconds # about the same performance as the same operation with Float64 entries> versioninfo()
Julia Version 1.5.4
Commit 69fcb5745b (2021-03-11 19:13 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: AMD Ryzen 7 1700X Eight-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-9.0.1 (ORCJIT, znver1)
> LinearAlgebra.versioninfo()
BLAS: libopenblas (OpenBLAS 0.3.9 USE64BITINT DYNAMIC_ARCH NO_AFFINITY Zen MAX_THREADS=32)
LAPACK: libopenblas64_