-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Closed
Labels
performanceMust go fasterMust go faster
Description
Or, alternately, "when @inline is slower than manually inlining." Here's an example:
type T
bits::BitVector
end
# I want to simply dispatch to an inlined function, but it's slower than I expected
unsafe_in1(s,n) = Base.unsafe_getindex(s.bits, n+1)
# So I started manually un-inlining it...
unsafe_in2(s,n) = Base.unsafe_bitgetindex(s.bits.chunks, n+1)
function unsafe_in3(s,n)
i = n+1
Bc = s.bits.chunks # This assignment causes a GC Frame
i1, i2 = Base.get_chunks_id(i)
u = uint64(1) << i2
@inbounds r = (Bc[i1] & u) != 0
return r
end
# Until I found a version that works as I expected: macro-style inlining
function unsafe_in4(s,n)
i = n+1
i1, i2 = Base.get_chunks_id(i)
u = uint64(1) << i2
@inbounds r = (s.bits.chunks[i1] & u) != 0
return r
endLooking at the results of code_llvm(unsafe_inX, (T, Int)), we emit a GC Frame in all of the above functions except unsafe_in4. This can cause slowdowns of up to 35% on very simple functions like the above:
function timeit()
s = T(falses(10))
unsafe_in3(s, 1); @time for i=1:100000000 unsafe_in3(s,1) end
unsafe_in4(s, 1); @time for i=1:100000000 unsafe_in4(s,1) end
end
timeit()
elapsed time: 0.433966378 seconds (0 bytes allocated)
elapsed time: 0.31291549 seconds (0 bytes allocated)This is true both before and after the SSA patch. I've done spot-checks on today's 61d3ece and a month-old 48aae1f with the same results.
Metadata
Metadata
Assignees
Labels
performanceMust go fasterMust go faster