-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Closed
Labels
Description
First reported here.
julia> using BenchmarkTools, SparseArrays, LinearAlgebra
julia> A = sprand(100,100,0.01);
julia> B = rand(100,100);
julia> C = A*B;
julia> @btime $C = $A*$B;
18.550 μs (2 allocations: 78.20 KiB)
julia> @btime $C = $B*$A; # other way around
20.097 μs (2 allocations: 78.20 KiB)
julia> @btime mul!($C,$A,$B);
14.531 μs (0 allocations: 0 bytes)
julia> @btime mul!($C,$B,$A);
825.506 μs (6 allocations: 336 bytes)This asymmetry of performance, which already existed on 0.6, is not (just) due to CSC format of sparse matrices but because mul! falls back to the most generic method in LinearAlgebra.
julia> @which mul!(C, A, B)
[...] SparseArrays\src\linalg.jl:110
julia> @which mul!(C, B, A)
[...] LinearAlgebra\src\matmul.jl:172It can be fixed (most simply) by copying the Base.:* method and adjusting it to be a mul! version:
import LinearAlgebra.mul!
function mul!(C::StridedMatrix, X::StridedMatrix, A::SparseMatrixCSC)
mX, nX = size(X)
nX == A.m || throw(DimensionMismatch())
fill!(C, zero(eltype(C)))
rowval = A.rowval
nzval = A.nzval
@inbounds for multivec_row=1:mX, col = 1:A.n, k=A.colptr[col]:(A.colptr[col+1]-1)
C[multivec_row, col] += X[multivec_row, rowval[k]] * nzval[k]
end
C
endwhich gives
julia> @btime mul!($C,$A,$B);
16.077 μs (0 allocations: 0 bytes)
julia> @btime mul!($C,$B,$A);
18.241 μs (0 allocations: 0 bytes)Note that PR #24045 might fix this (haven't looked into it in detail). However, since this PR is sort of lying around for over a year now, maybe we should add an intermediate fix?