Skip to content

Performance regression in vectorized loops #40276

@emmt

Description

@emmt

When switching from Julia-1.5.4 to Julia-1.6.0, I had a performance regression (by a factor between 2 and 3) in loops involving functions that return a tuple of values. My guess is that such loops stop being correctly vectorized (in spite of @inbounds @simd). Below is a short example to demonstrate the problem:

using BenchmarkTools

function compute_weights(t::T) where {T<:AbstractFloat}
    u = 1 - t
    v = (T(-1)/2)*t*u
    w1 = v*u
    w4 = v*t
    w = w4 - w1
    w2 = u - w1 + w
    w3 = t - w4 - w
    return (w1, w2, w3, w4)
end

function compute_weights!(dst::Array{T,2}, src::Array{T,1}) where {T<:AbstractFloat}
    @assert size(dst) == (4,length(src))
    @inbounds @simd for i in eachindex(src)
        w1, w2, w3, w4 = compute_weights(src[i])
        dst[1,i] = w1
        dst[2,i] = w2
        dst[3,i] = w3
        dst[4,i] = w4
    end
    return dst
end

function runtests(; T::Type{<:AbstractFloat}=Float32, n::Int=1_000)
    t = rand(T, n) .- one(T)/2
    z = Array{T}(undef, 4, n)
    print("Tests with Julia-", VERSION, ", T=", T, ", n=", n, "\n")
    print("Compute_weights! "); @btime $(compute_weights!)($z, $t)
    nothing
end

runtests(T=Float32)

Using git bisect I was able to figure out that the first Julia version showing the issue is fe1253ee258674844b8c035. For this commit, the above test yields:

Tests with Julia-1.6.0-DEV.1648, T=Float32, n=1000
Compute_weights!   1.370 μs (0 allocations: 0 bytes)

The timings for the previous version (commit 49b8e61a80b8108ca0a23f8 are:

Tests with Julia-1.6.0-DEV.1647, T=Float32, n=1000
Compute_weights!   518.611 ns (0 allocations: 0 bytes)

So the fastest version runs at 19.3 Gflops while the other version only runs at 7.3 Gflops.

Metadata

Metadata

Assignees

No one assigned

    Labels

    performanceMust go fasterregressionRegression in behavior compared to a previous version

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions