-
-
Notifications
You must be signed in to change notification settings - Fork 35
Open
Labels
performanceMust go fasterMust go faster
Description
In julia's native method for solving upper triangular systems (naivesub!), on line 930 in base/linalg/triangular.jl, it is stated:
for i in j-1:-1:1 # counterintuitively 1:j-1 performs slightly betterI decided to test this with the following code
function naivesub2!(A::UpperTriangular, b::AbstractVector, x::AbstractVector = b)
n = size(A, 2)
if !(n == length(b) == length(x))
throw(DimensionMismatch("second dimension of left hand side A, $n, length of output x, $(length(x)), and length of right hand side b, $(length(b)), must be equal"))
end
@inbounds for j in n:-1:1
A.data[j,j] == zero(A.data[j,j]) && throw(SingularException(j))
xj = x[j] = A.data[j,j] \ b[j]
for i in 1:j-1
b[i] -= A.data[i,j] * xj
end
end
x
end
function f(n,m,method)
A=UpperTriangular(randn(n,n));
v=[randn(n) for k=1:m];
t=@elapsed @inbounds for i=1:m
method(A,v[i])
end
return t/m
end
N = round(Int,logspace(1,3,50))
Tlapack = [f(n,1000,A_ldiv_B!) for n in N]
Tnaivesub = [f(n,1000,Base.LinAlg.naivesub!) for n in N]
Tnaivesub2 = [f(n,1000,naivesub2!) for n in N]I think the results are best illustrated on a plot, showing timings divided by n^2, where y1 is lapack, y2 is naivesub! and y3 is naivesub2!.

So it seems, slightly better means, a whole lot better. This raises two questions:
- Why was the less performant loop order in
naivesub!chosen? - Why even bother using lapack?
naivesub2seems to be roughly a factor 2 faster.
PS: my versioninfo()
Julia Version 0.5.0-rc3+0
Commit e6f843b (2016-08-22 23:43 UTC)
Platform Info:
System: Darwin (x86_64-apple-darwin15.6.0)
CPU: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.7.1 (ORCJIT, ivybridge)
Metadata
Metadata
Assignees
Labels
performanceMust go fasterMust go faster