Skip to content

Large performance regression in trunc with vector argument #19849

@giordano

Description

@giordano

After #19791, the manually vectorized method of trunc is deprecated and the dot syntax should be used. However, the 2-arg method is now much slower than before.

I've a function where this regression is sensible, even if trunc(Int, array) is a relatively small part of computation and profiling confirms that broadcasting trunc takes a while.

Before the PR (manually vectorized method):

julia> versioninfo()
Julia Version 0.6.0-dev.1807
Commit 26c8d856a (2016-12-31 04:12 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, haswell)

julia> using BenchmarkTools

julia> const array = collect(0.0:0.1:100.0);

julia> @benchmark trunc(Int, array)
BenchmarkTools.Trial: 
  memory estimate:  8.02 kb
  allocs estimate:  2
  --------------
  minimum time:     2.611 μs (0.00% GC)
  median time:      2.722 μs (0.00% GC)
  mean time:        3.242 μs (12.88% GC)
  maximum time:     165.228 μs (96.58% GC)
  --------------
  samples:          10000
  evals/sample:     9
  time tolerance:   5.00%
  memory tolerance: 1.00%

after the PR (dot syntax):

julia> versioninfo()
Julia Version 0.6.0-dev.1847
Commit 8f9036a7d (2017-01-02 23:39 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, haswell)

julia> const array = collect(0.0:0.1:100.0);

julia> using BenchmarkTools

julia> @benchmark trunc.(Int, array)
BenchmarkTools.Trial: 
  memory estimate:  23.75 kb
  allocs estimate:  1008
  --------------
  minimum time:     270.303 μs (0.00% GC)
  median time:      274.493 μs (0.00% GC)
  mean time:        281.129 μs (0.67% GC)
  maximum time:     2.250 ms (86.97% GC)
  --------------
  samples:          10000
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

julia> @benchmark trunc.(array)
BenchmarkTools.Trial: 
  memory estimate:  8.00 kb
  allocs estimate:  1
  --------------
  minimum time:     952.900 ns (0.00% GC)
  median time:      1.140 μs (0.00% GC)
  mean time:        1.409 μs (15.34% GC)
  maximum time:     148.246 μs (97.14% GC)
  --------------
  samples:          10000
  evals/sample:     10
  time tolerance:   5.00%
  memory tolerance: 1.00%

Now trunc.(Int, array) is 100× slower than the old trunc(Int, array) and more memory-eager. @code_warntype trunc.(Int, array) indicates that the returned value is type-stable, but there are type-unstable variables inside the function.

Note that the current trunc.(array) is comparable to the old trunc(Int, array).

Edit: for comparison, these are the results of benchmark in Julia 0.5:

julia> @benchmark trunc(Int, array)
BenchmarkTools.Trial: 
  memory estimate:  8.03 kb
  allocs estimate:  3
  --------------
  minimum time:     2.918 μs (0.00% GC)
  median time:      3.013 μs (0.00% GC)
  mean time:        3.355 μs (7.09% GC)
  maximum time:     107.170 μs (90.67% GC)
  --------------
  samples:          10000
  evals/sample:     9
  time tolerance:   5.00%
  memory tolerance: 1.00%

julia> @benchmark trunc(array)
BenchmarkTools.Trial: 
  memory estimate:  8.03 kb
  allocs estimate:  3
  --------------
  minimum time:     3.505 μs (0.00% GC)
  median time:      3.585 μs (0.00% GC)
  mean time:        3.938 μs (6.13% GC)
  maximum time:     112.471 μs (89.47% GC)
  --------------
  samples:          10000
  evals/sample:     8
  time tolerance:   5.00%
  memory tolerance: 1.00%

julia> @benchmark trunc.(array)
BenchmarkTools.Trial: 
  memory estimate:  8.02 kb
  allocs estimate:  2
  --------------
  minimum time:     2.882 μs (0.00% GC)
  median time:      3.000 μs (0.00% GC)
  mean time:        3.317 μs (6.78% GC)
  maximum time:     99.173 μs (83.76% GC)
  --------------
  samples:          10000
  evals/sample:     9
  time tolerance:   5.00%
  memory tolerance: 1.00%

trunc.(array) was already faster than trunc(array) and now it's even better, the problem is the method with a type argument.

Metadata

Metadata

Assignees

No one assigned

    Labels

    broadcastApplying a function over a collectionperformanceMust go fasterpotential benchmarkCould make a good benchmark in BaseBenchmarksregressionRegression in behavior compared to a previous versiontypes and dispatchTypes, subtyping and method dispatch

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions