Large performance regression in trunc with vector argument

After https://github.com/JuliaLang/julia/pull/19791, the manually vectorized method of `trunc` is deprecated and the dot syntax should be used.  However, the 2-arg method is now much slower than before.

I've a function where this regression is sensible, even if `trunc(Int, array)` is a relatively small part of computation and profiling confirms that broadcasting `trunc` takes a while.

Before the PR (manually vectorized method):
```julia
julia> versioninfo()
Julia Version 0.6.0-dev.1807
Commit 26c8d856a (2016-12-31 04:12 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, haswell)

julia> using BenchmarkTools

julia> const array = collect(0.0:0.1:100.0);

julia> @benchmark trunc(Int, array)
BenchmarkTools.Trial: 
  memory estimate:  8.02 kb
  allocs estimate:  2
  --------------
  minimum time:     2.611 μs (0.00% GC)
  median time:      2.722 μs (0.00% GC)
  mean time:        3.242 μs (12.88% GC)
  maximum time:     165.228 μs (96.58% GC)
  --------------
  samples:          10000
  evals/sample:     9
  time tolerance:   5.00%
  memory tolerance: 1.00%
```
after the PR (dot syntax):
```julia
julia> versioninfo()
Julia Version 0.6.0-dev.1847
Commit 8f9036a7d (2017-01-02 23:39 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, haswell)

julia> const array = collect(0.0:0.1:100.0);

julia> using BenchmarkTools

julia> @benchmark trunc.(Int, array)
BenchmarkTools.Trial: 
  memory estimate:  23.75 kb
  allocs estimate:  1008
  --------------
  minimum time:     270.303 μs (0.00% GC)
  median time:      274.493 μs (0.00% GC)
  mean time:        281.129 μs (0.67% GC)
  maximum time:     2.250 ms (86.97% GC)
  --------------
  samples:          10000
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

julia> @benchmark trunc.(array)
BenchmarkTools.Trial: 
  memory estimate:  8.00 kb
  allocs estimate:  1
  --------------
  minimum time:     952.900 ns (0.00% GC)
  median time:      1.140 μs (0.00% GC)
  mean time:        1.409 μs (15.34% GC)
  maximum time:     148.246 μs (97.14% GC)
  --------------
  samples:          10000
  evals/sample:     10
  time tolerance:   5.00%
  memory tolerance: 1.00%
```
Now `trunc.(Int, array)` is 100× slower than the old `trunc(Int, array)` and more memory-eager.  `@code_warntype trunc.(Int, array)` indicates that the returned value is type-stable, but there are type-unstable variables inside the function.

Note that the current `trunc.(array)` is comparable to the old `trunc(Int, array)`.


**Edit**: for comparison, these are the results of benchmark in Julia 0.5:
```julia
julia> @benchmark trunc(Int, array)
BenchmarkTools.Trial: 
  memory estimate:  8.03 kb
  allocs estimate:  3
  --------------
  minimum time:     2.918 μs (0.00% GC)
  median time:      3.013 μs (0.00% GC)
  mean time:        3.355 μs (7.09% GC)
  maximum time:     107.170 μs (90.67% GC)
  --------------
  samples:          10000
  evals/sample:     9
  time tolerance:   5.00%
  memory tolerance: 1.00%

julia> @benchmark trunc(array)
BenchmarkTools.Trial: 
  memory estimate:  8.03 kb
  allocs estimate:  3
  --------------
  minimum time:     3.505 μs (0.00% GC)
  median time:      3.585 μs (0.00% GC)
  mean time:        3.938 μs (6.13% GC)
  maximum time:     112.471 μs (89.47% GC)
  --------------
  samples:          10000
  evals/sample:     8
  time tolerance:   5.00%
  memory tolerance: 1.00%

julia> @benchmark trunc.(array)
BenchmarkTools.Trial: 
  memory estimate:  8.02 kb
  allocs estimate:  2
  --------------
  minimum time:     2.882 μs (0.00% GC)
  median time:      3.000 μs (0.00% GC)
  mean time:        3.317 μs (6.78% GC)
  maximum time:     99.173 μs (83.76% GC)
  --------------
  samples:          10000
  evals/sample:     9
  time tolerance:   5.00%
  memory tolerance: 1.00%
```
`trunc.(array)` was already faster than `trunc(array)` and now it's even better, the problem is the method with a type argument.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Large performance regression in trunc with vector argument #19849

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Large performance regression in trunc with vector argument #19849

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions