Skip to content

Conversation

@GiggleLiu
Copy link
Contributor

This PR fixes #42438. I do not know why it is more friendly to the compiler, I just some try and errors. Someone might want to look deeper into this issue (might be in LLVM according to @timholy 's comment), but we need to fix this issue due to the the need in some practical applications.

using Base.Cartesian
using Base: size_to_strides, checkdims_perm
using Random

for (V, PT, BT) in Any[((:N,), BitArray, BitArray), ((:T,:N), Array, StridedArray)]
    @eval @generated function newperm!(P::$PT{$(V...)}, B::$BT{$(V...)}, perm) where $(V...)
        quote
            checkdims_perm(P, B, perm)

            #calculates all the strides
            native_strides = size_to_strides(1, size(B)...)
            strides_1 = 0
            @nexprs $N d->(strides_{d+1} = native_strides[perm[d]])

            #Creates offset, because indexing starts at 1
            offset = 1 - sum(@ntuple $N d->strides_{d+1})

            sumc = 0
            ind = 1
            @nexprs 1 d->(counts_{$N+1} = strides_{$N+1}) # a trick to set counts_($N+1)
            @nloops($N, i, P,
                    d->(df_d=i_d*strides_{d+1} ;sumc += df_d), # PRE
                    d->(sumc -= df_d), # POST
                    begin # BODY
                        @inbounds P[ind] = B[sumc+offset]
                        ind += 1
                    end)

            return P
        end
    end
end


using Test

@testset "newperm" begin
    n=25
    t=randn(rand(1:2, n)...)
    perm = randperm(n)
    p = zeros(eltype(t), size.(Ref(t), (perm...,)));
    @time newperm!(p, t, perm);
    # 0.395072 seconds (520.17 k allocations: 20.894 MiB, 99.99% compilation time)
    @time permutedims!(p, t, perm);
    # 41.520901 seconds (502.11 k allocations: 20.155 MiB, 100.00% compilation time)
    @test newperm!(p, t, perm)  permutedims!(p, t, perm)
end

using BenchmarkTools
# high dim
n=25
t=randn(rand(2:2, n)...)
perm = randperm(n)
p = zeros(eltype(t), size.(Ref(t), (perm...,)))
@btime newperm!($p, $t, $perm)
# 179.814 ms (2 allocations: 96 bytes)
@btime permutedims!($p, $t, $perm)
# 201.038 ms (2 allocations: 96 bytes)

# low dim (make sure no performance regression)
t=randn(100, 200, 300)
perm = [3,2,1]
p = zeros(eltype(t), size.(Ref(t), (perm...,)))
@btime newperm!($p, $t, $perm)
# 19.203 ms (2 allocations: 96 bytes)
@btime permutedims!($p, $t, $perm);
# 19.339 ms (2 allocations: 96 bytes)

@GiggleLiu
Copy link
Contributor Author

(sorry for some unrelated commit history, whoever reviews this PR should use squash and merge)

Copy link
Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

@kshyatt kshyatt added arrays [a, r, r, a, y, s] performance Must go faster labels Oct 4, 2021
@timholy timholy merged commit 5b7bb08 into JuliaLang:master Oct 5, 2021
KristofferC pushed a commit that referenced this pull request Oct 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrays [a, r, r, a, y, s] performance Must go faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unreasonable slow just in time compiling of permutedims when the array dimension is high

5 participants