Skip to content

Conversation

@ndinsmore
Copy link
Contributor

Previously there was the potential for an infinite loop in the skip_deleted function in L=typemax(Int). This fixes that and provides some performance benefits.

Baseline:

julia> using BenchmarkTools

julia> d=Dict(v=>v for v in 1:1000);

julia> @benchmark collect(values(d))
BenchmarkTools.Trial: 
  memory estimate:  7.95 KiB
  allocs estimate:  2
  --------------
  minimum time:     6.313 μs (0.00% GC)
  median time:      8.026 μs (0.00% GC)
  mean time:        11.180 μs (15.87% GC)
  maximum time:     12.011 ms (99.82% GC)
  --------------
  samples:          10000
  evals/sample:     5

julia> @benchmark sum(values(d))
BenchmarkTools.Trial: 
  memory estimate:  32 bytes
  allocs estimate:  2
  --------------
  minimum time:     5.716 μs (0.00% GC)
  median time:      6.483 μs (0.00% GC)
  mean time:        8.101 μs (0.00% GC)
  maximum time:     1.149 ms (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     6

julia> @benchmark sum(v->v.second,d)
BenchmarkTools.Trial: 
  memory estimate:  272 bytes
  allocs estimate:  7
  --------------
  minimum time:     8.236 μs (0.00% GC)
  median time:      10.181 μs (0.00% GC)
  mean time:        12.392 μs (0.00% GC)
  maximum time:     1.016 ms (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     3

After fix:

julia> using BenchmarkTools

julia> d=Dict(v=>v for v in 1:1000);

julia> @benchmark collect(values(d))
BenchmarkTools.Trial: 
  memory estimate:  7.95 KiB
  allocs estimate:  2
  --------------
  minimum time:     4.154 μs (0.00% GC)
  median time:      7.265 μs (0.00% GC)
  mean time:        14.121 μs (17.00% GC)
  maximum time:     12.819 ms (99.87% GC)
  --------------
  samples:          10000
  evals/sample:     7

julia> @benchmark sum(values(d))
BenchmarkTools.Trial: 
  memory estimate:  272 bytes
  allocs estimate:  8
  --------------
  minimum time:     5.441 μs (0.00% GC)
  median time:      5.808 μs (0.00% GC)
  mean time:        6.910 μs (0.00% GC)
  maximum time:     243.749 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     6

julia> @benchmark sum(v->v.second,d)
BenchmarkTools.Trial: 
  memory estimate:  272 bytes
  allocs estimate:  7
  --------------
  minimum time:     5.749 μs (0.00% GC)
  median time:      6.785 μs (0.00% GC)
  mean time:        10.137 μs (0.00% GC)
  maximum time:     2.498 ms (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     6

@ndinsmore ndinsmore changed the title Fix Int64 overrun potential in dict skip_deleted Fix Int overrun potential in dict skip_deleted Mar 22, 2019
@StefanKarpinski StefanKarpinski added performance Must go faster collections Data structures holding multiple items, e.g. sets labels Mar 22, 2019
@JeffBezanson
Copy link
Member

Any speedup is nice of course, but do I understand right that this is guarding against arrays of length typemax(Int)? It doesn't seem necessary to worry about that.

@vchuravy
Copy link
Member

against arrays of length typemax(Int)? It doesn't seem necessary to worry about that.

Problem is that LLVM goes and proves an overflow and then decides that this loop is not worthy of future optimization.

@yuyichao
Copy link
Contributor

LLVM never seems to clone loop for this purpose. Even though it almost always emit the scalar loop anyway. Saw this for both overflow problem and alignment....

@ndinsmore
Copy link
Contributor Author

@JeffBezanson some relevant discussion happened in #31416.

For me this is for the burndown of a list of things that could get into the way of higher level iterators getting in the way of being really lean loops. In digging into that and trying to understanding what stopped fast loops from being emitted this showed up and I thought it should just be fixed while still top of mind.

@ndinsmore
Copy link
Contributor Author

A more poweful benchmark might be this

using BenchmarkTools
d=Dict(v=>v for v in 1:1000)


function for_sum(f,iter)    
    ret=0
    for v in iter
        ret += f(v)
    end
    return ret
end

function while_sum(f,iter)    
    ret=0
    next = iterate(iter)
    while next !== nothing
        (v, state) = next
        ret+=f(v)
        next = iterate(iter, state)
    end
    return ret
end

Before:

julia> @benchmark for_sum(v->v,values(d))
BenchmarkTools.Trial: 
  memory estimate:  32 bytes
  allocs estimate:  2
  --------------
  minimum time:     6.816 μs (0.00% GC)
  median time:      7.893 μs (0.00% GC)
  mean time:        9.323 μs (0.00% GC)
  maximum time:     294.384 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     4

julia> @benchmark while_sum(v->v,values(d))
BenchmarkTools.Trial: 
  memory estimate:  32 bytes
  allocs estimate:  2
  --------------
  minimum time:     5.534 μs (0.00% GC)
  median time:      6.920 μs (0.00% GC)
  mean time:        8.478 μs (0.00% GC)
  maximum time:     544.237 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     5

after:

julia> @benchmark for_sum(v->v,values(d))
BenchmarkTools.Trial: 
  memory estimate:  32 bytes
  allocs estimate:  2
  --------------
  minimum time:     3.461 μs (0.00% GC)
  median time:      4.886 μs (0.00% GC)
  mean time:        5.145 μs (0.00% GC)
  maximum time:     176.057 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     8

julia> @benchmark while_sum(v->v,values(d))
BenchmarkTools.Trial: 
  memory estimate:  32 bytes
  allocs estimate:  2
  --------------
  minimum time:     3.501 μs (0.00% GC)
  median time:      3.723 μs (0.00% GC)
  mean time:        4.421 μs (0.00% GC)
  maximum time:     127.837 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     8

These benchmark tailored to be like use in the wild.

@ndinsmore
Copy link
Contributor Author

This provides a good performance bump is there anything that is blocking?

@JeffBezanson JeffBezanson merged commit a399780 into JuliaLang:master Mar 29, 2019
@JeffBezanson
Copy link
Member

Thanks for explaining. Nice work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

collections Data structures holding multiple items, e.g. sets performance Must go faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants