-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Description
After #19791, the manually vectorized method of trunc is deprecated and the dot syntax should be used. However, the 2-arg method is now much slower than before.
I've a function where this regression is sensible, even if trunc(Int, array) is a relatively small part of computation and profiling confirms that broadcasting trunc takes a while.
Before the PR (manually vectorized method):
julia> versioninfo()
Julia Version 0.6.0-dev.1807
Commit 26c8d856a (2016-12-31 04:12 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, haswell)
julia> using BenchmarkTools
julia> const array = collect(0.0:0.1:100.0);
julia> @benchmark trunc(Int, array)
BenchmarkTools.Trial:
memory estimate: 8.02 kb
allocs estimate: 2
--------------
minimum time: 2.611 μs (0.00% GC)
median time: 2.722 μs (0.00% GC)
mean time: 3.242 μs (12.88% GC)
maximum time: 165.228 μs (96.58% GC)
--------------
samples: 10000
evals/sample: 9
time tolerance: 5.00%
memory tolerance: 1.00%after the PR (dot syntax):
julia> versioninfo()
Julia Version 0.6.0-dev.1847
Commit 8f9036a7d (2017-01-02 23:39 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, haswell)
julia> const array = collect(0.0:0.1:100.0);
julia> using BenchmarkTools
julia> @benchmark trunc.(Int, array)
BenchmarkTools.Trial:
memory estimate: 23.75 kb
allocs estimate: 1008
--------------
minimum time: 270.303 μs (0.00% GC)
median time: 274.493 μs (0.00% GC)
mean time: 281.129 μs (0.67% GC)
maximum time: 2.250 ms (86.97% GC)
--------------
samples: 10000
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
julia> @benchmark trunc.(array)
BenchmarkTools.Trial:
memory estimate: 8.00 kb
allocs estimate: 1
--------------
minimum time: 952.900 ns (0.00% GC)
median time: 1.140 μs (0.00% GC)
mean time: 1.409 μs (15.34% GC)
maximum time: 148.246 μs (97.14% GC)
--------------
samples: 10000
evals/sample: 10
time tolerance: 5.00%
memory tolerance: 1.00%Now trunc.(Int, array) is 100× slower than the old trunc(Int, array) and more memory-eager. @code_warntype trunc.(Int, array) indicates that the returned value is type-stable, but there are type-unstable variables inside the function.
Note that the current trunc.(array) is comparable to the old trunc(Int, array).
Edit: for comparison, these are the results of benchmark in Julia 0.5:
julia> @benchmark trunc(Int, array)
BenchmarkTools.Trial:
memory estimate: 8.03 kb
allocs estimate: 3
--------------
minimum time: 2.918 μs (0.00% GC)
median time: 3.013 μs (0.00% GC)
mean time: 3.355 μs (7.09% GC)
maximum time: 107.170 μs (90.67% GC)
--------------
samples: 10000
evals/sample: 9
time tolerance: 5.00%
memory tolerance: 1.00%
julia> @benchmark trunc(array)
BenchmarkTools.Trial:
memory estimate: 8.03 kb
allocs estimate: 3
--------------
minimum time: 3.505 μs (0.00% GC)
median time: 3.585 μs (0.00% GC)
mean time: 3.938 μs (6.13% GC)
maximum time: 112.471 μs (89.47% GC)
--------------
samples: 10000
evals/sample: 8
time tolerance: 5.00%
memory tolerance: 1.00%
julia> @benchmark trunc.(array)
BenchmarkTools.Trial:
memory estimate: 8.02 kb
allocs estimate: 2
--------------
minimum time: 2.882 μs (0.00% GC)
median time: 3.000 μs (0.00% GC)
mean time: 3.317 μs (6.78% GC)
maximum time: 99.173 μs (83.76% GC)
--------------
samples: 10000
evals/sample: 9
time tolerance: 5.00%
memory tolerance: 1.00%trunc.(array) was already faster than trunc(array) and now it's even better, the problem is the method with a type argument.