Skip to content

Conversation

@timholy
Copy link
Member

@timholy timholy commented Sep 13, 2020

This adds more benchmarks, specifically ones for Float32 (which due to smaller width than Float64 accentuates the benefits of vectorization) and for cartesian-indexed views of arrays (which exercise cartesian indexing).

Here's the current status on 1.5.1 (which is not good):

julia> include("benchmarks.jl")
[ Info: Benchmark tests are warnings for now
T = Float32
testf = test_getindex
f = mysum_elt_boundscheck
t_cv / t_ar = 4.5369018653690185
t_cv / t_ar = 6.535691604216944
t_cv / t_ar = 1.2106305711455927
t_cv / t_ar = 8.928769017980636
┌ Warning: colorview1: failed on mysum_elt_boundscheck, time ratio 8.928769017980636, tol 5
└ @ Main ~/.julia/dev/ImageCore/test/benchmarks.jl:139
f = mysum_index_boundscheck
t_cv / t_ar = 3.917422867513612
t_cv / t_ar = 6.250884225418534
t_cv / t_ar = 1.1220196353436185
t_cv / t_ar = 2.5358238585092945
f = mysum_elt_inbounds
t_cv / t_ar = 4.506186186186186
t_cv / t_ar = 7.4783595641646485
t_cv / t_ar = 1.255641748942172
t_cv / t_ar = 9.059506790564688
┌ Warning: colorview1: failed on mysum_elt_inbounds, time ratio 9.059506790564688, tol 3
└ @ Main ~/.julia/dev/ImageCore/test/benchmarks.jl:139
f = mysum_index_inbounds_simd
t_cv / t_ar = 21.95084175084175
┌ Warning: channelview1: failed on mysum_index_inbounds_simd, time ratio 21.95084175084175, tol 20
└ @ Main ~/.julia/dev/ImageCore/test/benchmarks.jl:120
t_cv / t_ar = 5.9062608494387225
t_cv / t_ar = 2.05045871559633
t_cv / t_ar = 6.286806883365201
┌ Warning: colorview1: failed on mysum_index_inbounds_simd, time ratio 6.286806883365201, tol 3
└ @ Main ~/.julia/dev/ImageCore/test/benchmarks.jl:139
testf = test_setindex
f = myfill1!
t_cv / t_ar = 17.764018220270053
t_cv / t_ar = 9.132899601924994
t_cv / t_ar = 11.38113116405472
┌ Warning: colorview1: failed on myfill1!, time ratio 11.38113116405472, tol 3
└ @ Main ~/.julia/dev/ImageCore/test/benchmarks.jl:133
t_cv / t_ar = 18.630171030754422
┌ Warning: colorview1: failed on myfill1!, time ratio 18.630171030754422, tol 3
└ @ Main ~/.julia/dev/ImageCore/test/benchmarks.jl:139
f = myfill2!
t_cv / t_ar = 21.973432262561666
┌ Warning: channelview1: failed on myfill2!, time ratio 21.973432262561666, tol 20
└ @ Main ~/.julia/dev/ImageCore/test/benchmarks.jl:120
t_cv / t_ar = 8.056054111138296
t_cv / t_ar = 12.690034550150067
┌ Warning: colorview1: failed on myfill2!, time ratio 12.690034550150067, tol 3
└ @ Main ~/.julia/dev/ImageCore/test/benchmarks.jl:133
t_cv / t_ar = 12.907169889979611
┌ Warning: colorview1: failed on myfill2!, time ratio 12.907169889979611, tol 3
└ @ Main ~/.julia/dev/ImageCore/test/benchmarks.jl:139

T = Float64
testf = test_getindex
f = mysum_elt_boundscheck
t_cv / t_ar = 4.3037558152728925
t_cv / t_ar = 6.523684392503976
t_cv / t_ar = 1.172765446910618
t_cv / t_ar = 7.287140854940435
┌ Warning: colorview1: failed on mysum_elt_boundscheck, time ratio 7.287140854940435, tol 5
└ @ Main ~/.julia/dev/ImageCore/test/benchmarks.jl:139
f = mysum_index_boundscheck
t_cv / t_ar = 3.9473974415527127
t_cv / t_ar = 6.671158443816561
t_cv / t_ar = 1.1111111111111112
t_cv / t_ar = 2.5224347826086957
f = mysum_elt_inbounds
t_cv / t_ar = 4.335294783297499
t_cv / t_ar = 5.419389546406931
t_cv / t_ar = 1.0817471448607494
t_cv / t_ar = 7.106967925422658
┌ Warning: colorview1: failed on mysum_elt_inbounds, time ratio 7.106967925422658, tol 3
└ @ Main ~/.julia/dev/ImageCore/test/benchmarks.jl:139
f = mysum_index_inbounds_simd
t_cv / t_ar = 10.912990936555891
t_cv / t_ar = 5.469922767322963
t_cv / t_ar = 1.0150922302962548
t_cv / t_ar = 2.4708855196660076
testf = test_setindex
f = myfill1!
t_cv / t_ar = 13.659139760810708
t_cv / t_ar = 10.164289441904987
t_cv / t_ar = 6.304755701854753
┌ Warning: colorview1: failed on myfill1!, time ratio 6.304755701854753, tol 3
└ @ Main ~/.julia/dev/ImageCore/test/benchmarks.jl:133
t_cv / t_ar = 9.62979374798582
┌ Warning: colorview1: failed on myfill1!, time ratio 9.62979374798582, tol 3
└ @ Main ~/.julia/dev/ImageCore/test/benchmarks.jl:139
f = myfill2!
t_cv / t_ar = 11.66571988777312
t_cv / t_ar = 9.563324490922094
t_cv / t_ar = 6.098197231245772
┌ Warning: colorview1: failed on myfill2!, time ratio 6.098197231245772, tol 3
└ @ Main ~/.julia/dev/ImageCore/test/benchmarks.jl:133
t_cv / t_ar = 9.277059364501417
┌ Warning: colorview1: failed on myfill2!, time ratio 9.277059364501417, tol 3
└ @ Main ~/.julia/dev/ImageCore/test/benchmarks.jl:139

Improving this on Julia 1.6 is a work in progress.

@codecov
Copy link

codecov bot commented Sep 13, 2020

Codecov Report

Merging #142 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #142   +/-   ##
=======================================
  Coverage   55.14%   55.14%           
=======================================
  Files          10       10           
  Lines         457      457           
=======================================
  Hits          252      252           
  Misses        205      205           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c4d3bc9...58fe147. Read the comment docs.

@timholy timholy force-pushed the teh/more_benchmarks branch from d923f54 to 58fe147 Compare September 13, 2020 12:47
@johnnychen94
Copy link
Member

johnnychen94 commented Sep 13, 2020

quick question: is this benchmark against julia 1.6-dev or julia 1.5.1? This is worse than I would expect.

@timholy
Copy link
Member Author

timholy commented Sep 13, 2020

1.5.1 (good question, I edited the above to clarify this)

@timholy
Copy link
Member Author

timholy commented Sep 13, 2020

Here's current Julia 1.6 master (9f6d4ddba530dff32353286a563bf03cf1c8e375, which incorporates improvements from JuliaLang/julia#37277):

julia> include("benchmarks.jl")
[ Info: Precompiling ImageCore [a09fc81d-aa75-5fe9-8630-4744c3626534]
[ Info: Benchmark tests are warnings for now
T = Float32
testf = test_getindex
f = mysum_elt_boundscheck
t_cv / t_ar = 1.749444379459586
t_cv / t_ar = 2.9028853454821566
t_cv / t_ar = 1.2423645320197045
t_cv / t_ar = 2.2992089343880875
f = mysum_index_boundscheck
t_cv / t_ar = 1.5895644071744701
t_cv / t_ar = 2.8077096483318305
t_cv / t_ar = 1.132014271813169
t_cv / t_ar = 2.52148591216748
f = mysum_elt_inbounds
t_cv / t_ar = 1.8068453805648637
t_cv / t_ar = 2.310429521356257
t_cv / t_ar = 1.2221875973823622
t_cv / t_ar = 2.374979770189351
f = mysum_index_inbounds_simd
t_cv / t_ar = 9.131621187800963
t_cv / t_ar = 2.7106450157870996
t_cv / t_ar = 2.476311336717428
t_cv / t_ar = 6.341965862271925
┌ Warning: colorview2: failed on mysum_index_inbounds_simd, time ratio 6.341965862271925, tol 3
└ @ Main ~/.julia/dev/ImageCore/test/benchmarks.jl:139
testf = test_setindex
f = myfill1!
t_cv / t_ar = 18.532558696182818
t_cv / t_ar = 7.136103852967982
t_cv / t_ar = 1.3862854704501815
t_cv / t_ar = 7.221769392801539
┌ Warning: colorview2: failed on myfill1!, time ratio 7.221769392801539, tol 3
└ @ Main ~/.julia/dev/ImageCore/test/benchmarks.jl:139
f = myfill2!
t_cv / t_ar = 18.320914645223592
t_cv / t_ar = 9.320231776997868
t_cv / t_ar = 1.3443665032864165
t_cv / t_ar = 3.0538170050803974
┌ Warning: colorview2: failed on myfill2!, time ratio 3.0538170050803974, tol 3
└ @ Main ~/.julia/dev/ImageCore/test/benchmarks.jl:139

T = Float64
testf = test_getindex
f = mysum_elt_boundscheck
t_cv / t_ar = 1.994165427296648
t_cv / t_ar = 2.3744029464234333
t_cv / t_ar = 1.1812209143169998
t_cv / t_ar = 2.222554591684116
f = mysum_index_boundscheck
t_cv / t_ar = 1.6743057123504177
t_cv / t_ar = 2.7419758232596916
t_cv / t_ar = 1.1539513262236807
t_cv / t_ar = 2.2730544309491676
f = mysum_elt_inbounds
t_cv / t_ar = 2.005361930294906
t_cv / t_ar = 3.0426595453300465
t_cv / t_ar = 1.1901330376940134
t_cv / t_ar = 2.228960766696616
f = mysum_index_inbounds_simd
t_cv / t_ar = 4.738853503184713
t_cv / t_ar = 2.6435719784449576
t_cv / t_ar = 1.0269046350559403
t_cv / t_ar = 2.5297202797202796
testf = test_setindex
f = myfill1!
t_cv / t_ar = 9.335739955286058
t_cv / t_ar = 6.975329048298688
t_cv / t_ar = 1.016292372754982
t_cv / t_ar = 3.9685877973620345
┌ Warning: colorview2: failed on myfill1!, time ratio 3.9685877973620345, tol 3
└ @ Main ~/.julia/dev/ImageCore/test/benchmarks.jl:139
f = myfill2!
t_cv / t_ar = 9.101060497972961
t_cv / t_ar = 8.719333216855414
t_cv / t_ar = 0.9893094921480637
t_cv / t_ar = 2.2791452294294676

@timholy
Copy link
Member Author

timholy commented Sep 13, 2020

That's better, but wait till you see it with JuliaLang/julia#37559:

julia> include("benchmarks.jl")
[ Info: Benchmark tests are warnings for now
T = Float32
testf = test_getindex
f = mysum_elt_boundscheck
t_cv / t_ar = 1.0326718296224588
t_cv / t_ar = 1.1410944206008584
t_cv / t_ar = 1.23257328990228
t_cv / t_ar = 0.8909475052819763
f = mysum_index_boundscheck
t_cv / t_ar = 1.0294481563969315
t_cv / t_ar = 0.9455580301905468
t_cv / t_ar = 1.434782608695652
t_cv / t_ar = 0.8890339425587467
f = mysum_elt_inbounds
t_cv / t_ar = 1.039189517107498
t_cv / t_ar = 1.1983264643436993
t_cv / t_ar = 1.2399745385105028
t_cv / t_ar = 0.8868844466114091
f = mysum_index_inbounds_simd
t_cv / t_ar = 1.5534407027818449
t_cv / t_ar = 0.18250064716541547
t_cv / t_ar = 0.9991266375545852
t_cv / t_ar = 2.0389242745930645
testf = test_setindex
f = myfill1!
t_cv / t_ar = 11.012250685828993
t_cv / t_ar = 0.45238016944181014
t_cv / t_ar = 1.9211861756014208
t_cv / t_ar = 2.2818724126790935
f = myfill2!
t_cv / t_ar = 1.7718788434944968
t_cv / t_ar = 0.47747029230670235
t_cv / t_ar = 1.278790683446607
t_cv / t_ar = 1.0042411864767544

T = Float64
testf = test_getindex
f = mysum_elt_boundscheck
t_cv / t_ar = 1.0783415550857411
t_cv / t_ar = 1.0880484114977307
t_cv / t_ar = 1.4283246977547495
t_cv / t_ar = 0.998362802881467
f = mysum_index_boundscheck
t_cv / t_ar = 1.0452079566003616
t_cv / t_ar = 0.9199031923475856
t_cv / t_ar = 1.2011398717644266
t_cv / t_ar = 0.9925306244397968
f = mysum_elt_inbounds
t_cv / t_ar = 1.0802007938360962
t_cv / t_ar = 1.0812926178476134
t_cv / t_ar = 1.3677555321390937
t_cv / t_ar = 1.014032258064516
f = mysum_index_inbounds_simd
t_cv / t_ar = 1.4192624418188329
t_cv / t_ar = 0.4768454837230628
t_cv / t_ar = 0.9684953829440521
t_cv / t_ar = 0.8139747995418098
testf = test_setindex
f = myfill1!
t_cv / t_ar = 8.08143732509389
t_cv / t_ar = 0.5703519908941922
t_cv / t_ar = 1.0961859395895273
t_cv / t_ar = 1.3408888760540603
f = myfill2!
t_cv / t_ar = 1.0262150792481337
t_cv / t_ar = 0.8962064149249264
t_cv / t_ar = 1.0051030666701113
t_cv / t_ar = 1.007155562225239

Only the myfill1! case is really bad now, and in quite a few the reinterpreted array is faster than its plain-Array cousin.

I should say that these were obtained by redefining

reinterpretc(::Type{T}, a::AbstractArray) where T = reinterpret(reshape, T, a)
reinterpretc(::Type{C}, a::AbstractArray) where C<:Colorant = reinterpret(reshape, ccolor(C, eltype(a)), a)

and commenting out all other reinterpretc methods.

@johnnychen94
Copy link
Member

Oh my god that's incredible! 🚀 🚀 🚀

@timholy
Copy link
Member Author

timholy commented Sep 13, 2020

It mostly comes down to giving LLVM enough info that it can unroll the inner loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants