- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 5.7k
Open
Labels
foldsum, maximum, reduce, foldl, etc.sum, maximum, reduce, foldl, etc.performanceMust go fasterMust go faster
Description
I have a custom array MyArray with Base.IndexStyle(a::MyArray) = IndexCartesian(). I noted that the builtin sum function is slower than a naive mysum implementation.
See this MWE
using BenchmarkTools
struct MyArray{T, A<:AbstractArray{T, 3}} <: AbstractArray{T, 3}
    data::A
end
Base.size(a::MyArray) = size(a.data)
Base.@propagate_inbounds @inline function Base.getindex(a::MyArray, i, j, k)
    @boundscheck checkbounds(a.data, i, j, k)
    @inbounds v = a.data[i, j, k]
    return v
end
# change this to IndexLinear to get ~5x speed up
Base.IndexStyle(a::MyArray) = IndexCartesian()
function mysum(a)
    S = zero(eltype(a))
    for k = 1:size(a, 3)
        for j = 1:size(a, 2)
            @simd for i = 1:size(a, 1)
                @inbounds S += a[i, j, k]
            end
        end
    end
    return S
end
N = 64
pa = rand(N, N, N)
a  = MyArray(pa)
@btime mysum($a)
@btime   sum($a)
@btime   sum($pa)On my machine I get
  47.294 μs (0 allocations: 0 bytes)
  453.761 μs (0 allocations: 0 bytes)
  48.895 μs (0 allocations: 0 bytes)which makes it 10x slower. I find surprising that disabling bound checking with the flag --check-bounds=no makes sum(::MyArray) twice as fast, i.e.
  46.498 μs (0 allocations: 0 bytes)
  216.276 μs (0 allocations: 0 bytes)
  48.483 μs (0 allocations: 0 bytes)Metadata
Metadata
Assignees
Labels
foldsum, maximum, reduce, foldl, etc.sum, maximum, reduce, foldl, etc.performanceMust go fasterMust go faster