-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Closed
Labels
compiler:simdinstruction-level vectorizationinstruction-level vectorizationperformanceMust go fasterMust go faster
Description
It turns out that #37829 has increased iteration performance for 2d array, while slowed down the iteration for higher-dimensional(>=4) array...
julia> using BenchmarkTools
julia> function arr_sum(X)
val = zero(eltype(X))
R = CartesianIndices(X)
for i in R
@inbounds val += X[i]
end
val
end
arr_sum (generic function with 1 method)
julia> X = rand(4, 4, 4, 4, 4, 4);
julia> @btime arr_sum($X)
5.584 μs (0 allocations: 0 bytes) # 1.6.0-DEV.1262
5.790 μs (0 allocations: 0 bytes) # 17a3c7702e2cb20171d1211606343fc50533a588
3.575 μs (0 allocations: 0 bytes) # 9405bf51a726a6383e6911eeb4235ba21ab3daee
3.572 μs (0 allocations: 0 bytes) # 1.5.2
5.959 μs (0 allocations: 0 bytes) # 1.0.5
julia> X = rand(64, 64);
julia> @btime arr_sum($X)
3.627 μs (0 allocations: 0 bytes) # 17a3c7702e2cb20171d1211606343fc50533a588
3.734 μs (0 allocations: 0 bytes) # 1.5.2SIMD and LinearIndices are not affected.
simd
julia> using BenchmarkTools
julia> function arr_sum_simd(X)
val = zero(eltype(X))
R = CartesianIndices(X)
@simd for i in R
@inbounds val += X[i]
end
val
end
arr_sum_simd (generic function with 1 method)
julia> X = rand(4, 4, 4, 4, 4, 4);
julia> @btime arr_sum_simd($X)
3.593 μs (0 allocations: 0 bytes) # 1.6.0-DEV.1262
3.827 μs (0 allocations: 0 bytes) # 1.5.2
3.585 μs (0 allocations: 0 bytes) # 1.0.5LinearIndices
julia> using BenchmarkTools
julia> function arr_sum_linear(X)
val = zero(eltype(X))
R = LinearIndices(X)
for i in R
@inbounds val += X[i]
end
val
end
arr_sum_linear (generic function with 1 method)
julia> X = rand(4, 4, 4, 4, 4, 4);
julia> @btime arr_sum_linear($X)
3.707 μs (0 allocations: 0 bytes) # 1.6.0-DEV.1262
3.626 μs (0 allocations: 0 bytes) # 1.5.2
3.796 μs (0 allocations: 0 bytes) # 1.0.5Metadata
Metadata
Assignees
Labels
compiler:simdinstruction-level vectorizationinstruction-level vectorizationperformanceMust go fasterMust go faster