-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Description
I looked into #9080 again, and as pointed out by @simonster, it seems obvious that the core problem is that our current iteration framework seems to require two branches per iteration rather than one. It seems this could be solved if we could, in effect, return break from a function.
The performance difference is illustrated by these functions:
function sumcart_iter_good(A)
s = 0.0
R = CartesianRange(size(A))
I = start(R)
@inbounds while true
s += A[I]
i1, i2 = I.I[1], I.I[2]
i1 += 1
if i1 > size(A, 1)
i1 = 1
i2 += 1
if i2 > size(A, 2) # note we test this only when i1 > size(A, 1)
break
end
end
I = CartesianIndex((i1, i2))
end
s
end
function sumcart_iter_bad(A)
s = 0.0
R = CartesianRange(size(A))
I = start(R)
@inbounds while true
s += A[I]
i1, i2 = I.I[1], I.I[2]
i1 += 1
if i1 > size(A, 1)
i1 = 1
i2 += 1
end
if i2 > size(A, 2) # here we test it on every iteration
break
end
I = CartesianIndex((i1, i2))
end
s
endThe good one typically has only one branch per iteration, whereas the bad one always has two. The good one has the same performance as writing the loops out manually, and the bad one is about 70% worse.
In this gist I present some code which would generate the good version, if we were able to "call" break from inside a function. (CR = CartesianRange, CI = CartesianIndex.) Note this is not the same as effectively testing done inside next, and just passing back the boolean; the key problem is the branch, not the comparison.
Note the gist is also free of @generated functions; we can do that regardless, of course.