Skip to content

Wrong matmul with empty matrices #2952

@tam724

Description

@tam724

The 5-arg mul!(C, A, B, α, β) returns the wrong result if A and B are empty matrices (of correct sizes). Here's a MWE:

julia> using CUDA
julia> using LinearAlgebra
julia> A = rand(3, 3);
julia> mul!(A, zeros(3, 0), zeros(0, 3), 1.0, 1.0)
3×3 Matrix{Float64}:
 0.752731   0.43537   0.830489
 0.0522754  0.915373  0.462034
 0.319501   0.120835  0.769065
julia> mul!(cu(A), CUDA.zeros(3, 0), CUDA.zeros(0, 3), 1.0, 1.0)
3×3 CuArray{Float32, 2, CUDA.DeviceMemory}:
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0

If any of A or B have 0 dimensions, the matmul instead hits:

return LinearAlgebra.rmul!(C, 0)

and zeros out all values in C. This should be LinearAlgebra.rmul!(C, beta).

However, relying on rmul! to zero out uninitialized arrays is insufficient, because the CUDA implementation of rmul! (scal) does not respect false as a strong zero (#2607). E.g. NaN's in the uninitialized memory would remain. The 3-arg mul! and thereby also the * operator assume correct "strong zero" behavior.

A clean way to fix the rmul! would be to rely on the GPUArrays.jl implementation and avoid using scal! from the cuda libraries. Would this affect performance?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions